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PREFACE 


An alternative title forthis book which would 
emphasize the point of view is Statistics, Its 
Philosophy and Method. This work is more than a 
revision of the writer's earlier Statistical 
Method (1923). That work covered a field which 
in the last twenty years has grown to such magni- 
tude that nothing short of an encyclopedia could 
cover it today. The endeavor herein has been to 
place a great emphasis upon the logic and prin- 
ciples underlying the statistical study of phe- 
nomena, to provide, in the early chapters, such 
basic issues as will integrate thoughtful and 
investigative moods with statistical processes, 
and, in the later chapters, to give such treat- 
ment of modern processes as is required in hand- 
ling many experimental situations and as will 
open to the reader the wealth of thought in cur- 
rent statistical literature. 

The early chapters constitute an elementary 
text, and call for no mathematical background 
other than that of arithmetic and elementary 
algebra. They seek, first, to show the place of 
statistics in the social process of solving prob- 
lems and, second, to provide the most basic 
tools. The approach to statistics of most be- 
ginning students is-that it holds: some useful 
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tricks which, with a certain drudgery, can be 
learned. The content of statistics is not a bag 
of tricks, but rational thought processes which 
serve in problem situations. That there is а 
nicety in these processes gives zest to their 
discovery and pleasure in their use. This view- 
point can scarcely be overemphasized, and it is 
hoped that teachers of this text will further 
emphasize and elaborate upon the utility of sta- 
tistical concepts in the everyday problems of the 
physical and biological scientist, the schoolman, 
the doctor, the farmer, the economist, and the 
businessman. 

The parallel work of the author, Statistical 
Tables, is designed specifically for laboratory 
Statisticians and is not a requisite to students 
of this text in view of the abridged set of 
tables given herein. 

In addition to the numerous acknowledgments 
cited throughout this text, the writer is deeply 
indebted to Eric F. Gardner for statistical re- 
searchin connection with sequential analysis and 
otherwise and he is also greatly indebted to him 
and to Katherine Ytredal forexpert assistance in 
the difficult task of preparation of tables and 
of the manuscript for the printer. 


Truman Lee Kelley 
Cambridge, Massachusetts 
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FUNDAMENTALS OF STATISTICS 


СНАРТЕВ 1 


THE DIGNITY OF DATA AND THE BACKGROUND 
OF STATISTICS 


SECTION 1 
STATISTICS, PSYCHOLOGY, AND LOGIC 


An isolated fact is an unthinkable phenomenon. 
Adults experience them and think of them after 
the occurrence, but not before. Furthur experi- 
encesof the same sort (the impossibility of such 
in an absolute sense is admitted, for a second 
experience cannot be exactly the same asa first) 
are no longer isolated. Infants presumably ex- 
perience them in great number, making each new 
day richer than the pastinability to interpret. 
These new infringements upon consciousness are а 
part of life which lies outside the field of 
statistics. Note the appropriateness of the 
plural, and that "statistic" is capable of defi- 
nition only after the plural has been defined. 
As a second experience of something, say a cat 
caterwauling, is different from the first, so 
the third is different from the second, etc. 
ad infinitum. There is no identity in successive 
experiences. In some sense each is isolated 
from all others, but in another and a psycho- 
logical sense it is not, and it is this latter 
fact that provides the basis for statistics. 


3 


4 DIGNITY OF DATA 


Let us attempt to trace the delimitation of 
attention of a person in an investigative mood. 
We, of course, cannot catch him at the beginning 
of any activity. At the time we first see him 
he is already possessed of many ideas, experi- 
ences, remembrances, interests, and attitudes, 
dating from the more or less remote past, and he 
18 in a particular environment providing par- 
ticular problems and data. The nature of the 
investigation that he undertakes is limited by 
these things. If on a country road, and if in 
his background there is the training, the inter- 
est, and the specific information of a botanist, 
he may narrow his attention to the botanical 
features around, whereas if a geologist he may 
turn his attention to rocks and soils. This 
narrowing of the field of interest is necessary 
to contemplative and creative thinking. In sta- 
tistical language it is the first step in the 
making of a statistical series. The items of 
Such a series do not encompass all that exists, 
or even all that is knowable, about the things 
observed, but only so much about each as is ap- 
parent when the attention is limited to a speci- 
fied narrow field. 

A statistical series is a succession of ob- 
served values referriné to a sinéle character of 
a number of objects, things, or cases, which have 
been selected accordins to some sinéle principle. 
In the situation wherein these cases are randomly 
drawn members of a much [агбег, generally infi- 
nite, parent population, the series is also a 
sample. * 

The establishment of the single principle of 
selection is the first step in methodical in- 


* For an excellent discussion of sampling see A. L. Bowley, 
ELEMENTS OF STATISTICS, fifth ed., 1926, and also James б, 
Smith and Acheson J. Duncan, SAMPLING STATISTICS AND APPLI- 
CATIONS which is vol, 2 of FUNDAMENTALS OF THE THEORY OF STA- 
TISTICS, 1945. 


PSYCHOLOGY AND LOGIC 5 


vestigation. Опе cannot and should not try to 
investigate everything at once. To attempt it is 
scatterbrained and unproductive. The original 
limits of interest of the botanist on the country 
road might be to living things and thence to 
vegetation. А further limitation might be to 
trees. A still further one to maple trees, and 
thence to one characteristic of them, say leaves, 
and thence to, say, the veining of leaves. If 

after following this greater and greater delimi- 
tation toa certain point, the investigator stops 
and then observes that there remains a variable 
phenomenon, say the number of veins, of a number 
of things, ledves, all falling within the limits 
of selection as he has defined them, he then has 
a statistical series, which in this illustration 
is also a sample, for the leaves noted may be 
thought of as a finite random selection from an 
infinite population of maple leaves. 

Had he stopped earlier in the process of de- 
limitation, say at the point when his interest 
was merely limited to living things, would he 
then have had a statistical series? The items 
might be the height of a pine tree, the bark of 
a dog, the speed of a pony, etc. Though these 
be under a single principle, —living things as 
encountered along the highway,—it is not enough, 
for there is also needed a single principle of 
observation. Had he observed the height of a 
pine tree, the height of a dog, of a pony, etc., 
we would call it a statistical series and also a 
sample, intractable though it may bein revealing 
any interesting or unifying fact of life. 

If we attempt to define statistical series 
and samples more narrowly than stated we run into 
logical difficulties. If we require that the 
elements of a sample shall be in truth observa- 
tions of a single sort, made upon things of a 
single sort selected in some single manner, we 
have narrowed it to the point of impossibility 
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of exact fulfillment. What are phenomena of a 
single sort and what is selection of a single 
sort? 

Let us consider a sample coming about as 
closely as we can conceive to meeting these con- 
ditions,—the lengths of clam shells collected 
upon a certain beach. Are the recorded lengths 
the lengths of a single phenomenon? Тз the se- 
lection of the successive shells made in a single 
manner? Are the shells clam shells? If the 
characteristics of two species, say clams and 
snails, are so different that there is no over- 
lapping in regard to some character, we can 
certainly avoid getting snail shells when col- 
lecting clam shells, but many a problem of prac- 
tical statistics is not of this sort. It may 
well be that some shells other than those origi- 
nally thought of or defined as clam shells may 
be included in the sample. In the last analysis 
the safeguard against this lies in someone's 
judgment that shell А, all things considered, is 
of the same species as shell В. We may call this 
crucial act the judgment of sameness. 

Next consider length, —what is the length of 
a clam shell? Here is one which is chipped at 
the end and we discard it, but here is one with 
reference to which it is hard to say whether it 
has been chipped or not, and we are again forced 
to depend upon a judgment of sameness. 

Now consider the selection of shells under 
similar conditions. Shell A is collected at a 
certain time and space, and this identical time 
and space does not provide us with another shell. 
Shell A is collected at high, and shell B at low 
tide. Are these collected in some single manner? 
Clearly no. All we can ever assert is that the 
items have been collected under conditions judged 
the same so faras factors that we judge material 
are concerned. Stated otherwise, there has been 
a judément of irrelevance of those conditions 
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of sampling that have been observed to vary from 
item to item. Under optimum conditions we can 
assert that the collector's threshold of differ- 
ence recognition, as it pertains to the con- 
ditions of sampling, 15 not exceeded by the 
several experiences yielding the observations of 
his sample, but we may not claim more than this. 
The logical issues connected with sampling, 
probability, and truth have been thoroughly 
discussed by von Mises (1939). 

Mathematical statistics can sidestep all these 
issues by simply assuming that a true sample is 
at hand. This is never the case. To realize 
that applied statistics does not rest upon pure 
logic may be a disappointment to the reader, but 
itis because it does not that it concerns itself 
with phenomena, not noumena, and that PUSIS 
adaptable to all the problems of life in their 
partly repetitive aspects, which are their chief 
aspects. All the fine-spun deductions of mathe- 
matical statistics are by some amount wide of 
the mark when applied to real phenomena. Logi- 
cally this must be so. They are astray by an 
amount which judgment alone bears witness to. 
How wide and what of it are questions that are 
very difficult of quantitative answer. As these 
difficulties are analvzed they regularly run 
back to the question of the soundness of some 
judgment of sameness or of relevance. 


Consider the last. The geologist finds a 
certain fossil at 10 a.m. and a second at 3 p.m. 
of a certain day. It may not occur to him that 
the time of finding is pertinent to any intrinsic 
characteristic of the fossils and he neglects 
méntioning the time. Whether overt or not, this 
is a judgment that the time is immaterial. 
The non-overt judgments of the novice fre- 
quently lead to insidious and serious error. ТЕ 
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the first specimen is found in a gravel bed and 
the second in a clay bank, he undoubtedly would 
consider these space conditions as material and 
he would not report the items as of a single 
sample. The effective judgment may thus be 
either a positive or a negative act. Sampling 
errors due to this latter are very insidious and 
are committed by the man of narrow vision who 
blandly steps in where wiser men refrain to 
tread. 


The hazard of the positive act of judgment, 
which the collector of a sample makes, is of а 
very different and lesser sort. The judgment of 
relevance, if made consciously and defended, may 
be one of those judgments which mankind can make 
with considerable competence. Consider the 
following demands upon judgment, assuming no 
instrumental aids in making the judgments: 
(a) is А twice as tall as B? (b) is A 1.5 times 
as tall аз B? (с) is А 1.1 times as tall as B? 
(4) is A just as tall as B? The accuracy of the 
respective judgments is a matter of psychology, 
which would probably inform us that the accuracy 
of the sameness judgment is far greater than that 
of any of the other judgments. Certainly in the 
matter of pitch discrimination a very small 
difference, say that between 256 and 256.1 vi- 
brations a second, can be distinguished when the 
two sounds are equally loud and heard under 
equally favorable conditions and differ in time 
from each other by a very short interval. Here 
the query is, "Are the two the same?" and the 
answer can be given with a precision far exceed- 
ing the judgment that А is some multiple of B. 
Though we lack a general proof we may well be- 
lieve that the sameness judgment is the most pre- 
cise judgment within the power of the human mind 
to make. 


In view of this we need not be unduly dis- 
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turbed by the fact that the pertinence of the 
beautiful techniques of mathematical statistics 
to an actual problem are contingent upon some 
earlier act or acts of judgment. There is a real 
and important difference between mathematical 
statistics and the applied statistics with which 
the scientist is concerned in his attempt to 
discover or verify principles and laws underlying 
observable phenomena. 


There is a quadruple alliance inherent in 
sound statistical research: phenomena, 1.е., 
data; logic, i.e., mathematics; human psychology 
in its power to judge sameness; and human psy- 
chology in its power to appraise relevance. If 
this total activity is to be at its best, the 
data must be germinal, the mathematical elabo- 
ration sound and penetrating, the judgment of 
sameness made with reference to things that the 
mind has primary faculties for sensing, and the 
judgment of relevance made in the light of an 
imagination so rich that few possibilities and 
connections are overlooked. 


We may not assert with reference to any data 
collected to throw light upon an unsolved issue 
that it merely awaits the expert to bring to 
fruit the important germ within. Though good 
judgment determined the data selected, it may 
still be barren. Also, prior to study, we cannot 
assert that any new data consequent to causes 
not known a priori, are void of virtue, but one 
large class does seem peculiarly sterile. In 
dealing with biological and social phenomena 
samples not composed of cases competitive with 
reference to the issues studied are, character- 
istically, barren of important relationships. 
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SECTION 2. HISTORICAL ANTECEDENTS OF 
MODERN STATISTICS 


Modern statistics is the outgrowth of a number 
of lines of social and scientific development. 
Because of this mixed ancestry, and because the 
convergence of these lines into a single disci- 
pline, statistics, is none too obvious, there 
are strong lines of cleavage in present proc- 
esses, and even in avowal as to the function of 
statistical method. Historically we find trade 
and vital statistics developing with the growth 
of business, particularly of international trade, 
and with national concern for welfare of citi- 
zens, their availability for the army, and their 
racial and vocational characteristics. The 
comprehensive work The History of Statistics 
(1918), edited by John Koren, is an excellent 
picture of this growth. The topics treated of 
in this work indicate the extent and detail 
reached through this line of development: Here 
are a few only of the topics falling under the 
first four letters of the alphabet under the 
general heading "United States." 


Accidents Corporations 
Agriculture census Cotton 
Ranks, statistics of Crime 
Rirth statistics Crop Fstimates 
Census Office Death statistics 
Children’s Bureau Defective classes 
Church statistics Delinquency 
Civil War Veterans Divorce statistics 
Coal trade Duties 
Commerce Commission, 

Interstate 


Commerce, Department of 
Commerce foreign 
Commerce internal 


ANTECEDENTS OF STATISTICS 11 


Social phenomena have teemed with classifiable 
facts, and the student, whether tradesman or 
legislator, has realized that the careful col- 
lection of these facts, their assorting into 
independent or related categories, for successive 
intervals of time, was informative of the course 
of events, enabling an understanding and control 
of them not otherwise possible. This line of 
statistical development may be characterized as 
a response to the welling of social phenomena. 
With this as a starting point the technique of 
the accountant in recording transactions in money 
or things of monetary value is taken over and 
adapted to the recording of amounts in categories 
other than monetary. The good bookkeeper becomes 
the good statistician, and that is so. 

Put this is only part of the story of modern 
statistics. letus turn to Studies in the History 
of Statistical Method, by Helen М. Walker, (1929). 
The prospectus of this book states that it "pre- 
sents the modern use of statistics against a 
background of the work of De Moivre, Bernoulli, 
Gauss, Laplace, Quetelet, Galton, Ebbinghaus, 
Fechner, and many others." Here we find differ- 
ent names, different topics, and a radically 
different emphasis, as witness the following 
entries occurring under the first four letters 
of the alphabet in the table of contents: 


Airy Boas 
Anthropologists, Bowley 
statistical work of Bravais 
Association Charlier 
Astronomers, Chi-square 
contribution to the Contingency 
theory of probability Correlation 
Attributes Correlation, partial 
Average Correlation, multiple 
Bernoulli, Jacques Curvilinear regression 
Bernoulli, N. Differences 
Bessel Distribution 


Binomial expansion 
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А most elementary acquaintance with the names 
and topics here listed shows a vast difference 
in approach from that previously mentioned. 
Modern statistics isa very real fusing of varied 
antecedents which can be analyzed into much 
greater detail than into just the two particular 
lines mentioned, which we may call the social 
and the scientific backgrounds. In the scientific 
background we find developments in each of the 
Sciences progressing more or less independently 
of each other. There is a statistics of eco- 
nomics, of life insurance, of biology, of psy- 
chology, of engineering, of physics and chemistry, 
of astronomy, and of mathematics. The develop- 
ments in each of these fields are continually 
running into related fields, but each has its 
taint, such that a trained statistician unfa- 
miliar witha given foreign language could almost 
certainly pick up a statistical work written in 
that language and allocate it to its appropriate 
field merely by observing the formulas upon the 
pages. 


SECTION 3. OCCASIONS FOR RESORT TO STATISTICS 


Differences in logical antecedents: A very 
fundamental difference depends upon an underlying 
difference in the logical issue that is set. We 
may say that there are two occasions for resort 
to statistical procedure, the one dominated by a 
desire to prove a hypothesis, and the other by a 
desire to invent one. This has led to distinct 
schools of statisticians, both lying within the 
general field of scientific endeavor. 

The first school is that represented by mathe- 
maticians who start with certain elementary 
principles and deduce therefrom facts of distri- 
bution, frequency, and relationship. In so far 
as observed situations parallel these conclusions 
the same elementary principles are supported as 
applying to the data in hand. One weakness of 


OCCASIONS FOR STATISTICS 13 


this approach lies in the fact that a number of 
causes — different sets of elementary princi- 
ples — may result in substantially the same net 
result. А still greater weakness is that it is 
essentially a deductive procedure and relatively 
sterile in suggesting new causes — in inspiring 
creative.inferences. It is fundamentally a 
method of proof and not one of invention; and 
just because it is a method of proof, it has a 
permanent place in statistical method. It must, 
however, if in the service of the social and 
biological sciences, be but a handmaid to the 
creative genius of mathematical synthesis and 
induction. 

The second school is best represented by those 
biometricians and economists who start with ob- 
served data and endeavor so to group them and 
treat them that the constant features of the data 
are made apparent. This is a process of sta- 
tistical analysis. It may at times be expected 
to be an involved process, for social phenomena 
are complex. Data are frequently warped to fit 
statistical convenience, but if statistics is to 
realize its high destiny, procedure must be 
flexible, for only when the method is mobile can 
it fit immobile data. The accurate measurement 
of those features of phenomena which are ex- 
ceptional, but not chance, is the unique province 
of statistical analysis. 

The present treatment follows the second 
school, for the starting point of every problem 
is data, actual data and not hypothetical. When 
data are found to approach a hypothetical form, 
as for example at times normality of distri- 
bution, then the beautiful relationships of the 
thecretical or ideal form are availed of to the 
utmost, but it is intended that the treatment 
will never lose the characteristic of its origin, 
and that an observed agreement with objective 
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phenomena will always be considered proof of 
soundness of the technique pursued. 

The method of approach in this text is in- 
ductive, starting with data and deriving con- 
stants, and will not give the noumenal satis- 
faction that comes from tossing coins, throwing 
dice, and sorting cards, thus obtaining distri- 
butions which approach an ideal standard. 

Psychological warrant for statistics: It is 
probably true that in the social experiment of 
living difficult procedures are a sort of last 
resort. If the issues of life can be solved by 
logic, why use any other method? If the issues 
can be solved by the nearly exact procedures of 
chemical or physical analysis, why go to the 
bother of dealing with many samples? If the 
biological laws of inheritance are given by 
knowable and unalterable lawsof cause and effect. 
why investigate general tendencies of mating and 
of family relationship? In each instance there 
is no sufficient reason. The facts, however, 
seem to show that life does not reveal itself 
with such certainty. Even in chemistry and 
physics, only approximate answers to problems are 
given in exact and concise terms, —the inexact- 
ness in observed fulfillment being due not only 
to "impurity" of the data at hand, but also to 
the fact that logical and mathematical formu- 
lations are characteristically, probably uni- 
versally, but approximations to realitv. 

Where the approximations are close and where 
the data are, or can Бу selection or experimental 
refinement be made, relatively pure. the adequacy 
of the "principle," the "law," and the "formula" 
is greatest and statistical techniques are in 
least demand. Technology well represents this 
field as does that upon which it is built, the 
"established" facts of science. However, in the 
process of establishing such factsa very differ- 
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ent attitude was present. The physicist observes 
seemingly irregular changes in X as Y changes. He 
repeats his experiment, controlling more and more 
of the conditions, and repeats again and again, 
and, if successful, reaches a law at the end of 
his work. He has been using statistics, though 
the engineer who utilizes his law does not. Sta- 
tistics are inherently connected with the formu- 
lative stagesofknowledge. This is true with re- 
gard to statistics as accumulations of repeated 
observations, and as techniques for discovering 
the characteristic featuresof such accumulations. 
Statistics and experimental research are wedded. 
In the caseof the physicist introducing one after 
another greater degree of control over his data, 
it is especially to be noted that the stimulus to 
do so has been consequent to antecedent statis- 
tics, i.e., variability in severalormany obser- 
vations in which variability was not expected. In 
cutting down this variability through the greater 
control of conditions, he may say that he is 
avoiding statistical techniques, but the fact is 
that statistical considerations have been the 
very cause of these later steps. In one sense we 
may say that the main purpose of statistics has 
been to eliminate itself. This is a sound view. 
Statistics is a research instrument and drops out 
of the picture to the degree that the problem 
becomes solved. However, as long as undetermined 
variability in outcome is present, the statistical 
aspects of the situation remain. Їп this sense 
most of the important problems of social life are 
never solved, but are only in the process of 
being solved. 

This can be adequately expressed in terms of 
the total and the part variances entering into a 
statistical situation. We will illustrate the 
meaning of variance with the 20 hypothetical 
temperatures at a single weather station, as 
listed in Table I A. 
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TABLE IA 


TEMPERATURES 


Deviations | Squares of |Ргод- 


Pre- from Means | Deviations 
Actual || с+ед Errors 


X X e х|х|е| x? ЇЕВ 
74 73 | V|3 1 16| 9 
74 74 0 4} 4) 0 16| 16 
66 T 66 0 -41-41 0 16| 16 
76 73 3 851 3] 3 36| 9| 9| 9 
70 73 -3 0| 3-3 0| 9| 9-9 
72 73 -1 2| 3-4 4) 9 || -3 
66 67 -1 -4 | -31-1 161 9 l| 3 
74 72 2 4| 2) 2 16} 4) 4] 4 
70 72 -2 0| 2-2 0| 41 4] -4 
70 67 3 0 -a| 3 0| 9| 9] -9 
72 72 0 2120 4) 41 0| 0 
70 68 2 01-212 0| 4| 4j -4 
64 67 -3 -6 | -3 |-3 86| 9| 9| 9 
68 67 | -2 | -3| | 4] 9| 1-3 
68 70 -2 -2| 0-2 чо чо 
66 68 -2 -4 |-21-2 16| u| 4) 4 
72 70 2 2102 4! 0| 4] 0 
68 68 0 -21-210 Ч| 4| 0| 0 
72 70 2 24 One 4) 0| 4| 0 
68 70 -2 -2 | 0-2 4| 0| | 0 
Sums 1400 1400 0 0101 0} 200/128, 72| 0 
Means 70 70 0 0|0/|0/10.0/6.4|3.6| 0 
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The predicted values are those made by the 
forecaster 24 hours earlier and based upon all the 
meteorological information available to him at 
that time. Thus X is that which is known about 
the situation and e is that which is unknown. 
For each item the actual temperature, X, equals 
the predicted temperature, X, plus the error of 
prediction, e, раны = Xe: The same holds 
when dealing with deviations from means, for 
x-X-M, and X = X - M, and e =е - 0, so that 
хе. 

The quantity e, being the error in the pre- 
diction, is in the illustrative data ofTable I A, 
as it universally must be, independent of the 
prediction X. Otherwise expressed, as later 
proven in connection with the derivation of a 
correlation coefficient, the sum of the xe 
products is equal to zero. In this case the 
variance of the X's (their mean square) plus the 
variance of the e's exactly equals the variance 
of the x's. We note that 10.0 = 6.4 + 3.6 and 


in general Vy = Vy + у. This important property 
x x e 


of divisibility into independent additive parts 
of this particular measure (the variance) of 
variability is not a property of any other measure 
of variability and is the basic reason why this 
measure facilitates thinking as does no other 
measure of variability. 

Let us choose such units that the variance in 
temperatures when no defining facts are given is 
1 (not 10 as in the data of Table ТА). If the 
defining facts give such information that it is 
now known that the range of probable temperatures 
has a variance №, there is still much variability 
in'the temperature to be expected 24hours hence. 
The forecasting formula is inadequate to give pre- 
cise knowledge, so statistical issues of distri- 
bution of probable temperatures remain. They 
will continue to remain until forecasting is more 
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accurate and the person, whether meteorologist 
or layman, interested in this residual 50 per 
cent of the variance will continue indefinitely 
to be concerned with a statistical problem. 
Table I B may help to picture the field wherein 
statistics operates. 


TABLE I B 


SITUATIONS CLASSIFIED ON BASIS OF USE 
OF STATISTICS AND STATISTICAL METHODS 


Residual 
variance 
after us- 
ing such 


Resulting in Situations wherein 


Statistics are Statistics are 
commonly not 
used by 


used by 


А. Chemists, |F. Research 


Precise engineers, scientists in 
and classes| certain fields 
B, C, D, of astronomy 
and Е. physics, bu- 


reau of stan- 
dards, coast 
and geodetic 
Survey, etc. 


1.00 | Excellent} .05 В. Technolo= |Class Е and 
gists, 6, Research 
"Practical"| technologists 
profession-| scientists in 
а1 men, and| such fields 
classes C, as the 
D, and Е. physical 

Sciences. 
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TABLE I B 
(CONTINUED) 
[Residual NE. f 5 
In- |Knowledge |i once Resulting in Situations wherein 
itial jof factors 
Vari- | determi- atter us- |Statisticsare | Statistics are 
алсе (отлы я such | commonly not used by 
сй» сү Knowledge| — used by |_ 
1.00 | Very .20 Te. Rule of Classes F,G, and 
Useful thumb workers |H. Research work- 
executives, ers in medi- 
men who "do cine, educa- 
things," tion, psy- 
school men, chology, the 
and Classes biological 
D and Е. sciences, and 


. 


but poor 


uncritical 
people, and 
Class E. 


charlatans, 


and uncritical 


occasional ly 
in the social 


sciences. 
1.00 | Useful .50 D. Promoters, |Classes F,G,H,and 


|. Research 
workers in the 
social sciences, 
the more careful 
executives, 
salesmen, 4014- 


ance counselors 
1.00 |Meaningful, .80 E. Semi-compe- |Classes F,6,H, I, & 


tent'students] |J. Fairly intelli- 


gent people in 
all walks of life. 


enthusiasts, | Words indicative 
gamblers who | of such use are 
characteristi-| "probability, " 
cally "foot "tendency, " 
the bill." "likelihood, " 
"usually," 
"sometimes," 


"generally," etc. 


Ы 
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The primary fieldof statistics is represented 
by the middle portion of the table. When knowl- 
edge is nearly complete, and .999 of the variance 
can be accounted for, but few issues hinge upon 
the small fraction of the variance which is un- 
known. When knowledge is very inadequate, the 
entire field is avoided so far as possible by 
serious people, and especially by scientific 
workers, and is left in the hands of uncritical 
people. The occasion for statistical thinking is 
so common that the question is not whether one 
shall use it, but how fine а tool St is-in his 
hands. This book aims merely to make explicit 
certain logical implications of phenomena, and in 
no sense to provide a tool that has not first 
been demanded by the requirements of clear think- 
ing, as applied to quantitative and qualitative 
facts of life. 

It should be obvious from Table IB that sta- 
tistical methods should serve well many of the 
common needs of ordinary folk. However, their 
use and value in this connection is limited by 
understanding and not by the intrinsic fitness of 
the tool. In writing for a lay audience certain 
definite restrictions as to concepts and tech- 
niques employed are imposed. These may be very 
serious, making it well-nigh impossible to illus- 
trate crucial points. Such terms as variance and 
residual, used in preceding Paragraphs, are 
taboo. Substitution for the word residual can be 
made, but no substitution for variance is possi- 
ble without first taking the time to teach quite 
a bit of statistics. As a consequence a large 
and important class of problems wherein a total 
variance is capable of analysis into its com- 
Ponent parts is beyond the scope of the under- 
Standing of a popular audience, Undoubtedly 
attempts at popular presentation of this concept 
will be made, but we may expect them to be as 
unsatisfactory as the attempt to explain corre- 
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lation by means of such common concepts as per- 
centages of a total, proportion of agreement, 
and other elementary mathematical concepts, all 
of which give the impression of conveying infor- 
mation while actually leadine astray. What does 
it mean to say that a correlation is 66 per cent 
of perfect, or such that there is 66 per cent 
agreement? Presentation of such concepts to lay 
audiences is fraught with misunderstanding. 

There remain many statistical concepts and 
techniques which may be employed in this case: 
There are available certain graphic methods, 
time charts, circle charts, growth curves, fre- 
quency polygons, means, ratios, percentages, 
limits, ranges, categories, frequencies in class- 
es, and still other concepts. Some of these have 
been sadly overworked, as for example the "ratio" 
in connection with I.Q.’s (intelligence quo- 
tients), E.Q.'s (educational quotients), A.Q.'s 
(accomplishment quotients), etc., where concepts 
of growth, dispersion, and correlation are sta- 
tistically more neat and informative. It should 
not be assumed that with literary skill the most 
recondite concepts can be presented to any audi- 
ence of normal intelligence. This is scarcely 
so. Mave the valient attempts to present "rela- 
tivity" in the Sunday supplements resulted in a 
nation informed upon this subject? The next 
three chapters are concerned with such elementary 
statistical techniques that they are serviceable 
in connection with popular presentation, but it 
must not be assumed that they reasonably circum- 
scribe the field of statistical effort of the 
practical school man, psychologist, or sociolo- 
gist. They merely serve as an introduction to 
quantitative thinking. 

Analysis of phenomena: Papers presented to 
scientific bodies may be concerned with the 
analysis of phenomena, but presentations to popu- 
lar audiences are prinarily propaganda, —legiti- 
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mate enough if the statistical picture given is 
accurate, but clearly serving a noninvestigative 
and nonanalytical purpose. The subtler phases of 
Statistics deal with analysis, not with descrip- 
tion. It is the good fortune of propagandists 
that this is so. It is very desirable that such 
a one should not overstep the limitations of his 
technique. To present some simple time chart 
and, by noting its ups and downs, proceed to 
analyze it and explain underlying causes is chi- 
canery or wishful thinking. The analysis of a 
time chart calls for information far beyond that 
given in the chart itself. It is very easy for 
an investigator without dishonest intentions to 
(1) have a theory, (2) draw a chart, (3) see 
tendencies in it which are explicable in harmony 
with his theory, (4) think that the chart sup- 
ports his theory, (5) present it to an audience, 
and (6) deduce in the presence of the audience 
the theory as consequent to the phenomena of the 
chart. Procedures like this discredit statistics 
and have, in poorly informed quarters, led to 
the charge that one can Prove anything by sta- 
tistics. 


to by the delicate tool of Statistical analysis. 
Its uniqueness is its outstanding: feature. То 
believe, or try to get, several Stories out of 
it, or merely proof for some preconceived story, 
is denying its genius. A Turner might paint the 
Smoky Mountains in brilliant red and present a 
charming Picture, but what a sad story this 
would be to one who knows them, and what a false 
guide to the scout seeking landmarks in territory 
new to him. 

The first function of Statistics is to Бе 
purely descriptive, and its second function is 
to enable analysis in harmony with hypothesis, 
and its third function to sugsest by the force 
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of its viráin data analyses not earlier thought 
of. In carrying out the first function we shall 
deal in forthcoming chapters with graphic por- 
trayal, phenomena of distribution of a single 
variable, and phenomena of relationships between 
two or more variables. All of this is elementary 
statistics. 

More advanced statistics concerns itself with 
such a study of given phenomena that invariant 
features of it are discovered, —they being there, 
of course, all the time, merely waiting ап ap- 
propriate throwing aside of overlaid rubble to 
bring them to light. If a brilliant hypothesis 
has correctly conceived of some feature as being 
invariant, the testing out of the hypothesis is 
generally not difficult though transcending the 
simplicity of mere description. If no such hypo- 
thesis is present, there still is a possibility, 
by viewing the data now from this aspect, now 
from that, of discovering its stable features. 
If charged with painting а mountain, an artist 
might first ride around it on horseback to dis- 
cover "the" vantage point above all others. бо 
the statistician with new material may search 
diligently and by devious methods to find the 
unique story in his data. All the interest of 
the detective, all the joys of the artist, may 
be combined in his search. 

Though statistical analysis has much in common 
with pure mathematics, there is one important 
difference. The mathematician assumes his initial 
conditions and then searches out their necessary 
consequences, building a more and more beautiful 
and intricate mathematical structure. The sta- 
tistician starts not with an assumption, but with 
given data with their conditions denying arbitrary 
assumptions, and seeks to build a thought struc- 
ture in harmony with it, and to indicate conse- 
quences that are in harmony with it. If the sta- 
tistician could but first choose the nature of 
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his data and then ramify at heart’s content and 
announce consequential relationships, more and 
more remote consequences could be pointed out. 
Put since given data are inflexible in nature, 
then as characteristics are observed and deduc- 
tions made, the more remote the deduction, the 
more uncertain its verity. Remote deductions are 
like the end of a whiplash. Тї may oscillate 
violently, though the whip handle move but slight- 
ly. The argument involving a long chain of 
consequential relationships which is so powerful 
in mathematical analysis can seldom be employed 
in statistical analysis. The slight initial 
error always to be expected in finite data may 
become augmented into agrievous falsehood by the 
time the end of the long argument has been reached. 
For sound statistical analyses the magnitude of 
errors іп the initial starting point must be more 
or less accurately apprehended and their signifi- 
cance carried along through every step so that 
the final outcome of an analysis has attached to 
it a definite concept of probable error. The 
Process of attaching to each step of a deductive 
argument a correlative step of probable error 
may be long and difficult, but it can scarcely 
be abridged. 

If now we turn to mathematics as a tool in 
Statistics, we shall find it essential and of 
universal service. 


SECTION +. THE MATHEMATICAL BACKGROUND OF 
STUDENTS OF STATISTICS 


. The common observation of teachers of sta- 
tistics that their students are ill-prepared in 
arithmetic as well as algebra arouses no great 
Surprise in colleagues in other departments, be- 
cause every teacher is: prone to blame the short- 
comings of his instruction upon the earlier 
training of his pupils. Nevertheless, the 
teacher of elementary statistics has surprising 
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ground for his charge. The writer's experience 
with elementary students has mainly been with 
those in education and psychology, but it has 
ineluded a fair sprinkling of students of eco- 
nomics, biology, and mathematics. In terms of 
excellence of mathematical background he would 
rank them as follows: mathematics majors first, 
followed in order by biology, psychology educa- 
tion, economics, and sociology majors. Of the 
education majors one subgroup, those specializing 
in English, have consistently shown the poorest 
mathematical equipment, and generally the least 
"flair" for statistical thinking. This ranking 
seems congruent with the presumptive interest and 
cultural antecedents of students majoring in 
these different fields. Frequently it is helpful 
to both student and teacher to realize these 
group differences. 

A Background Test, so called because it was 
used to measure the mathematical background of 
students entering courses in statistics, was 
given to 406 students entering, or early in, a 
course in elementary statistics, drawn from 9 
classes at the Universities of Columbia, Harvard, 
Illinois, Minnesota, Oregon, and Stanford. The 
courses mentioned were given in psychology and 
education departments so that if the writer’s 
ranking of the mathematical background of stu- 
dents according to major fields above is fair, 
the 406 students represent a sampling which is 
not below average. Опе form of the Background 
Test is printed in Appendix A. Students who are 
desirous of finding out their standing in this 
fundamental background material may take the test 
and score it themselves, and interpret it via the 
percentile norms, given in Appendix A, based on 
the 406 students. 

The idea of making a "good" score on a test 
has become so established that students will be 
sorely tempted to cheat themselves. Some of the 
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- ways in which a taker of the test can do it are: - 
(a) Read test and scoring key in Appendix A be- 
fore starting the test. (b) When taking the 
test, not rigidly adhere to time limits. (с) Ве 
aware that he is failing because he does not 
quite understand directions. ook up an answer 
ог two just to get the drift of the thing. 
(d) Before taking the test talk it over with 
fellow students or instructor. (е) Меп scoring, 
credit himself with an answer which he intended 
to be the same as that called correct in the 
Scoring kev, but which actuallv is a little 
different. (f) Generally get rather disgusted 
with his dumb responses and give a little extra 
credit here and there. Опе who does any of these 
things while taking the test cannot have the 
satisfaction of an unbiased appraisal of himself 
in comparison with the run of college students 
starting a first course in statistics. 

A number of very interesting tendencies have 
соте to light asa result of giving the Packground 
Test. It would be appropriate to discuss them 
here, but this would assist students desirous of 
taking the test under the standard conditions 
imposed upon the 406 from whom percentile norms 
have been drawn up. 

Suffice it here to say that the sketchiness 
of the average entering student's knowledge of 
arithmetic and elementary algebra would be ap- 
palling were it not true that even with such poor 
equipment students are still able to master some 
of the broad principles of statistical thinking 
and treatment of data. 

That, with appropriate background, accomplish- 
ment would be much greater than now possible has 
been forcibly voiced by the Committee approved 
by the Advisory Committee (1933) of Social and 
Economic Research in Agriculture, making its re- 
port to the Social Science Research Council in 
1932, entitled Collegiate Mathematics Needed in 
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the Social Sciences. The purpose of the Commit tee 
was to determine what mathematics should be 
taught to students of the social sciences. Аз 
the Committee itself pointed out, its report has 
been largely concerned with the needs of students 
of economics. There is a slight overemphasis of 
mathematics connected with temporal series and 
index numbers in the report of the Committee, as 
judged by the needs of students of education and 
psychology, but, with slight modification, their 
statement is an excellent one as to these needs, 
and probably even as to the needs of biology 
students as well. They write: 


"The committee believes that a knowledge of 
the following phases of collegiate mathematics 
would be quite helpful and useful to a large 
proportion of the students in economics, and that 
many students in other social sciences would find 
it worth while to obtain the knowledge and train- 
ing that can be acquired through the study of 
these phases of mathematics, providing it can be 
done without taking too much time from the study 


of other subjects. 


"l. Logarithms: for understanding special 
plotting methods and forms of equations; for 
numerical computation in connection with curve 
fitting and elsewhere. 


"2. Graphs: (as a mathematical tool) repre- 
sentation of tabulated data on ordinary and loga- 
rithmic papers; measurement of slopes and areas 
(especially in connection with frequency and 
cumulative curves); graphical determination of 
maxima and minima; representation of such three- 
variable relationships as are encountered in 
statistical studies, by means of three-dimensional 
diagrams and contours. 


"3, Interpolation: by reading graphs (ordi- 


nary, semi-logarithmic, etc.); by proportional 
parts; and possibly by successive differences. 
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"4. Equations and forms of important curves: 
straight lines; parabolic and hyperbolic curves; 
exponential and logarithmic curves; logistic 
curves; sine and cosine curves. The object 
should be to make the student sufficiently fa- 
miliar with the forms and characteristics of the 
simpler curves to enable him to recognize readily 
the type of curve suitable for fitting to any 
given bodyofdata, and to appreciate the princi- 
ра! implications involved in the use of such 
curves. 


"5. Probability: combinations, binomial theo- 
rem, elementary probability; the normal proba- 
bility curve,—its form, table, and equation; 
probability and frequency distributions; proba- 
bility and time series. 


"6. Elements of differential and integral 
calculus: the significance of a differential as 
a limit, as a rate or slope, as a frequency, 
etc.; partial differentiation; formula for 
differentiating elementary mathematical func- 
tions; procedure for determining maximum and 
minimum values; integration as the reverse of 
differentiation; relation between integration and 
summation; multiple integration. 


"Т. Curve fitting (mathematical principles): 
plotting tabulated values, their logarithms, re- 
ciprocals, etc., to ascertain whether the tabu- 
lated points lie on a curve of simple form- 
determination of the curve through such points, 
or selected points within the scatter-diagram or 
chart; the method of least Squares or other 
methods for obtaining the constants of curves of 
best fit." 


The Committee then points out that the back- 
ground recommended can ordinarily be gotten by 
pursuing not less than 15 or 20 semester hours 
of work in a college department of mathematics, 


чч» ee o 
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аз courses are now ordinarily given, and they 
observe that this requirement is too heavy a 
demand upon economics majors, particularly in 
viewof their belief that the desired mathematical 
training could be given in from 6 to 9 semester 
hours if mathematics courses were organized with 
this in view. The Committee recommends the organ- 
ization of such courses in departments of mathe- 
matics. The proposal is an excellent one, but 
until carried out what is the student of sta- 
tistics in any one of the social sciences to do? 
If, inall justice to his program, he cannot take 
l5or20 hours of work in the mathematics depart- 
ment, what is the alternative? That must be de- 
cided by each student himself, but the writer 
recommends that the average student, desirous of 
mastering the elementary concepts of statistics 
and not expecting to major in it or todevote his 
life to research, first, take not less than a 
five-semester-hour course in college mathematics 
(a general comprehensive freshman course in 
college algebra, analytic geometry, etc., if 
available), and, second, that he do some serious 
reading upon the mathematical topics just cited 
to supplement his equipment. As a cue in this 
the student is particularly advised to remedy 
his deficiency as revealed in the Background Test 
of Appendix A. It may be stated that in con- 
structing this test a broad survey was made to 
determine what аге the elementary mathematical 
concepts demanded for the understanding of current 
research. Some of the relatively simple mathe- 
matical concepts found in the test are not heavily 
represented in the literature. For example, 


11 3/8 divided by 1.65, or аЗаб = а, whereas the 
more advanced concept (а + b)", is a common and 


powerful instrument in statistics. In other 
words, the student should read with a purpose 
which is dictated by the needs of the problem of 
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the subject, not by previous organizations of 
knowledge. If he wishes to know how to plot 
у = sine of x, which upon recourse co references 
he finds discussed on page 300 of some text, it 
should not be necessary for him to read the pre- 
ceding 299 pages to secure an answer, but it may 
readily happen that to understand page 300 he 
will have to follow back a chain of pages, per- 
haps like this, 300-299-298-240-241-242-150-84- 
a family of concepts known by the student. This 
type of reading in mathematics is economical of 
time and effort, and is to be highly recommended. 
That it can be followed in mathematics may not 
have occurred to students accustomed to think of 
the mathematics text as positively sequential, 
page by page. References could be given to in- 
numerable texts showing where problems of each of 
the types of those in Appendix А are discussed, 
but any college student should be able to locate 
such references himself, and he is expected to 
do so in connection with each problem that he has 
missed, if he has taken the test. His arith- 
metical and algebraic shortcomings, as revealed 
by the test, especially if so pronounced as to 
place himinthe lower quarter, should be remedied 
before he starts Chapter VI, and his analytic 
geometry before starting Chapter IV. This state- 
ment asserts that the student is called upon to 
know certain principles of analytic geometry at 
an earlier stage than common principles of alge- 
bra. This is the case, and it is quite commonly 
so. The problems of social statistics have not 
been organized to fit the sequences of the writers 
of mathematics texts, or the organizers of high 
school and college curricula in mathematics. We 
should be content with this situation and expect 
statistics to provide its own sequences in harmony 
with its own vista of social problems. An enumer- 
ation is here given of certain mathematical con- 
cepts which are important for statistics and 
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which fall under the headings of the report cited: 


Logarithms: There are two important sorts, 
logarithms to the base 10 (here designated "log"), 
which are used in computational work, and loga- 
rithms to the base e (here designated "ln") oc- 
curring in formulas and curve fitting. The loga- 
rithm of a number, say 15, to the base 10, is the 
power to which 10 must be raised to give 15: 


thus 10* = 15, where xis the required logarithm. 


The Napierian or natural logarithm is the ex- 
ponent of amagnitude called e which to the first 


seven decimal places = 2.7182818. Thus е? = 15, 
or 2.7182818" = 15, where y is the natural loga- 


rithm of 15. From the usual table of logarithms 
we obtain 

log 15 = 1.176091 =x= logarithm to the base 10 
Employing a table of natural logarithms we obtain 
ln 15 - 2.708050 - y - logarithm to the base e 
Either one of these logarithms might readily 
have been obtained, knowing the other, because 
of a constant ratio maintaining between them. 
Logarithms to the base 10 may be gotten from 
logarithms to the base e by multiplication by a 
quantity M, which to the first nine figures 
equals .434294482, called the modulus of loga- 
rithms to the base 10. Thus log а = И in a. For 
example, 1.176091 = .434294482 (2.708050). 
Systems of logarithms in addition to these two 
will scarcely concern tHe elementary student, but 
he should be able to use either of these in simple 
problems. The meaningof terms "base," "modulus," 
"characteristic" (that part ofa logarithm to the 
Базе 10 to the left of the decimal point), "man- 
tissa" (that part to the right of the decimal 
point), "logarithm," "anti-logarithm" (the number 
corresponding to the logarithm), "co-logarithm" 
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(the logarithm of the reciprocal of the number) 
should be known, for there is commonly a peculiar 
fitness in logarithmic presentation and interpre- 
tation of ratios and indexes and of quantities 
having a natural zero point, as, for example, 
the price of a commodity or a wage. Where a zero 
point is not known to exist, or where it is un- 
important, or is poorly defined, logarithmic 
treatment is seldom in order. These brief obser- 
vations, as well as those of succeeding para- 
graphs, are not intended to instruct the reader 
in mathematical background, but to make him aware 
of the elementary phases of the subject with 
which he should become acquainted, if not already 
a master of them. 

Graphs: So common a type of curve as that 
shown in Chart I-I, which the mathematically 
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uninformed reader thinks he understands, takes 
on richness and niceties of meaning when one has 
a mental picture of the graph of such functions 
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аз the parabola у = Ьх?, theexponential у = ae^*, 


х H а 9 
the Gomperz curve у = a^ , the simple logistic 


curve y = or the growth-senescence 


dl ct Obese 
curve у = ——2.——— . Simple graphic portrayal 

e>* + ех 
of data upon two-dimensional paper can commonly 
be made without any knowledge of the shape and 
properties of mathematical curves, but an exami- 
nation of graphs without such knowledge is rela- 
tively sterile in suggesting relationships. A 
knowledge of logistic and more involved curves 
is not expected of elementary students of sta- 
tistics, but they should, prior to statistical 
instruction, be acquainted with such elementary 
concepts as the straight line and its represen- 
tation by an equation, the elementary forms of 


the second degree equation y = ax’, yx = a, 


x? + y? = a?, the simple exponential y = a*, and 


they should soon become acquainted with more 
complex curves. 


Interpolation: The veriest tyro is called 
upon to interpolate when using tabled data. The 
subject is of increasing importance as more re- 
fined work is done. At the initial stage the 
student should be able to use correctly linear 
interpolation, that based upon first order differ- 
ences, in connection with any tabled data. Ап 
illustration and definitionof certain terms will 
be made in connection with Table I C. This is 
the common arrangement of a table of logarithms. 
The primary use of the table is in getting loga- 
rithms, knowing numbers, and its secondary use 
is getting numbers, knowing logarithms. Based 
upon its primary use, the means of entering the 
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TABLE I C 
LOGARITHMS TO THE BASE 10 


NUMBER LOGAR ITHM 


«0000 
«3010 
. 4771 
. 6021 
. 6990 
. 7782 
. 8451 
.9031 
. 9542 
1.0000 


© ‹ф соз ©› л = оо ко — 


table, called the argument, is a number, and the 
value gotten out of the table, called the conse- 
quent, is a logarithm. The logarithm of 5 is 
-6990, or, in general terms, we enter with the 
argument and obtain the consequent. A symbol 
which will be used in this text to indicate 
a one-to-one relationship between the measures 
of a first series and those of a second is ==, 
which is to be read "is equivalent to,"—the 
equivalence being in terms of a given principle 
of relationship. Thus the statement 5 == ,6990 
means that, in terms of the law of relationship 
given by the table, that is, by the equation 
y = log x, the magnitude 5, when otherwise ex- 
pressed, is the magnitude .6990. 

The same table may be used to obtain a number 
corresponding to a logarithm. When so used the 
entries in the second column become the argument 
and the entries in the first column the conse- 
quent. То find equivalents corresponding to a 
given magnitude commonly calls for interpolation. 
Thus, to find the logarithm of 3.5 we go through 
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this computation: 


51:52 004) 
1-0 — 3.0 (.6021 - .4771) + .4771 = .5396 


ог this computation: 
.5 (.6021) + .5 (.4771) = .5396 


This process is linear interpolation. It is the 
simplest kind. There is ordinarily an error in 
the answer obtained. Їп this case the error is 
-.0045, because the correct answer gotten from a 
more detailed table is .5441. We may proceed 
similarly, as the reader may verify, to find the 
anti-logarithm of .5540, that is, the number 
whose logarithm is .5540. The answer is 3.6152 
which, from a more detailed table, is found to 
be in error by .0342. 

Arithmetically exact linear interpolation 
should be within the power of the student before 
proceeding further, but parabolic and other im- 
proved methods involving second and higher order 
of differences are not expected of him at this 
time. The reading of a chart or graph with con- 
siderable accuracy by means of visual inter- 
polation should also be within his power. 

The symbol == will be found serviceable in 
many ways, for example, to indicate scores on 
two tests which are comparable, or equivalent. 
Tbus, if the fifth-grade mean score on the Abb 
spelling test is 48.7 and on the Baa spelling 
test is 17.0, these scores represent equivalent 
degrees of excellence and we may write : 


Abb 48.7 - Ваа 17.0 


It may happen that the consequent for an argu- 
ment outside the limits of a given table is 
desired. It can be gotten by a process of extra- 
polation, but, in general, the process is subject 
to considerable error. For example, suppose we 
have given Table I C. 
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TABLE I C 
LOGARITHMS TO THE BASE 10 


NUMBER LOGARITHM 
! . 0000 
2 «3010 
3 „4771 
4 . 6021 
5 . 6990 
6 . 7782 


and we desire the logarithm of the number 7. We 
proceed thus 


7-00 - 5.00 - 3-00 (.7782 - .6990) + .6990 = .8574 


This is in error by .0123. Ifthe distance extra- 
polated is small, the error may not be great, 
but at best it is more hazardous extrapolating 
than interpolating, and effort should be made to 
avoid the necessityof doing so. The more refined 
methods greatly improve the accuracy of inter- 
polation, but they may not improve extrapolation 
at all. 


Equations and forms of important curves: The 
student would do well to plot oneormore of each 
of the types listed by the committee and preserve 
for purposes of reference. The most difficult 
one to plot is the logistic curve, which calls 
for the use of logarithms or, most simply, of a 
"log log" slide rule. 

In addition to the curves mentioned, the 
following are particularly important to the stu- 
dent of education or psychology: the normal 
curve; the integral of the normal curve, which 
looks much like the logistic curve, and, along 
with other curves having the characteristic of a 
single inflection, is called an ogive curve; 
peaked, flat-topped, asymetrical, and bi-modal 
curves, the equations of which are somewhat 
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involved and beyond the scope of ап elementary 
treatment. Even so, the shapes of these curves 
and knowledge that concise mathematical state- 
ments of each of them are available are valuable 
facts to know. 


Probability: First in tbis connection is 
knowledge of the binomial expansions(a * b)". 


This is closely associated with the concept of 
permutations and combinations, and with Poisson's 
series, point distributions, the normal distri- 
butions and with certain skewed distributions. 
However, most of these topics are appropriate 
topics for a second course in statistics, 50 
that the beginning student is fairly prepared 
if he has a knowledge of the binomial theorem 
and can handle a few simple problems in combin- 
ations and permutations. 


Differential and integral calculus: An ele- 
mentary knowledge of calculus is of such vast 
importance to statistics that beginning students 
interested in a second or third course in the 
subject should take an elementary course in 
calculus, but without such it is to be expected 
that he would be able to handle a first course 
in statistics. 


Curve fitting and the method of least squares 
These topics are equally appropriate to mathe- 
matics, engineering, statistics, and certain 
other fields, but even an elementary knowledge 
of it is hardly to be expected antecedent to 
entering a first course in statistics. In fact 
it might well be the legitimate subject matter 
ofasecond course, and not a prerequisite to it. 


Other topics not specifically mentioned in 
the report of the committee: The propagation 
of error in the terms of a chain of operations. 
This issue has to do with the number of figures 
which are significant in a sum, a product, a 
quotient, etc., when the number of significant 
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figures entering into the parts is known. The 
subject, though fundamental to quantitative study 
in every scientific field, is seldom treated of 
in arithmetics and algebras. Accordingly, a 
brief treatment of the simpler phases of it is 
given herewith. 

In pure mathematics "significant figures" 
referto those that are quantitatively meaningful 
and not merely determinative of the decimal point. 
Thus in the three following, .7568, 75.68, .007568, 
the number of significant figures is 4 in each 
instance. In the case of .007568 (which many 


Scientists would prefer to express 7.568 X 10-2) 


the first two zeros merely serve to place the 
decimal point and are not "significant." To 
write 756800, so that the reader will know the 
last two zeros merely place the decimal point 
and are not themselves significant, it should be 


given as 7568 X 10? or as 7.568 X 10°. If written 


156800, it is a figure of six significant figures. 
Thus in mathematics the number of significant 
figures in a recorded value is the number of 
integers recorded, not counting zeros to the 
immediate right of the decimal point in the case 
of numbers less than one. This meaning is useful 
in statistics but another meaning of the term is 
also important. 

When any measurement is taken the number of 
figures or decimal places to which the measure- 
ment is expressed should be limited by the accu- 
racy of the means of measurement employed. If 
а wooden meter stick is used by a palsied man to 
measure the length of a live angleworm, and the 
answer recorded as 7.64381 centimeters, it is 
obvious that most of the figures put down are 
not "significant," that is, not trustworthy. 
Probablv the 7 is significant, and the 6 may be 
better than a guess, but the 4, 3, 8, and 1 are 
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not entitled to space and attention given them. 
In fact, they should not appear as a part of the 
measurement. The measurer тау have looked upon 
the 4, 3, 8, and ] as representing the best that 
he could 40, but that is not sufficient to warrant 
recording them. If he felt sure of the 7 and 
only somewhat doubtful of the 6, it would be 
appropriate to record the length as 7.6 centi- 
meters. In this case, which is intended to be 
typical of published data, the last figure ге- 
corded is admittedly somewhat in error, but not 
so much inerror as to be no better than a guess, 
and the figure preceding the last figure 18 sup- 
posed not to be in error at all, except onlv as 
there may be a small variationofa tenth carried 
over from the error in the figure to the right. 
Counting decimal places from left to right, we 
may lay down the rule to record into the figure 
first thoudht to be ín error by not more than 5. 
Thus, if the length be recorded as 1.6, the reader 
is to understand that the author considers the 
error in the tenths place to be not grenter than 
5 in this place, and not less than 1/2, for, if 
less than 1/2, the recording would have been 
1.64, suggesting an error not greater than 5 in 
the one-hundredths place. let us write the length 
thus, 1.64381, meaning thereby that there 18 an 
expected error of not more than 5 nor less than 
1/2 in the figure under the star, so that of 
course the 4, 3, 8, and 1 are meaningless. 

So much for original data, Now what happens 
when numerical magnitudes having errors are added, 
multiplied, and divided by other magnitudes with 
errors. Consider 

7.64381 + 8.47653 = 16.12034 


Over which figure does the star belong in this 
answer? Consider also the average 
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7.64381 + 8.41653 
2.000000000 


Which figure should be starred in this answer? 


= 8.0617 


Clearly the first answer is 16:17 and the second 
8.1. 
Consider the case 

7.6 + 8.4 _ 


-0 
2 8 


There is an error of unknown amount but presuma- 
bly not greater than 5 nor less than 1/2 in the 
tenths place in each addend. To make the problem 
specific let us say that the error is l in this 
place and that we do not know whether itis posi- 
tive or negative. Then 7.6 is in error by .1 
and the correct value may be written (7.65.1). 
Similarly the correct value for the second addend 
is (8.45.1), and we have 


(7.62.1) + (8.41.1) 


2 78.1, or 8.0, or 8.0, or 7.9 


depending upon which signs are taken. The value 

8.0 [given by (7.6 * 8.4) / 2] is thus in error 

ВУ ЕТ, Мот 057 ог. 0, or sl. "The average of 

these errors algebraically is 0, and the average 
of them irrespective of sign is .05 [given by 

(5152510 41.0 + Д) 4]- which is seen: to be less 
than the errors іп the measures separately, which. 
was .l. If a positive error in the first addend 
goes with a positive error in the second, or a 

negative error in the first witha negative error 
in the second, then the error in the average is 

of the same order of magnitude as it is in the 

addends separately, but if this is not the case, 

the error in the average is of a less order than 

in the addends separately. 

If the errors in the addends are "random" or 
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"chance," there is a much less error in the mean 
than in the measures singly. The specific nature 
of this decrease in error is shown in Chapter VI 
where the standard error of the mean is derived. 
As there proven, the order or size of an error 


in a mean is ——l. X the order of size of the 


ум 


errors in the approximately equally reliable 
addends, where N is the number of addends yield- 
iná the mean in question. 

It is left as an exercise for the reader to 
demonstrate that the error in a difference be- 
tween two numbers has the same limits as to size 
as that in their sum. Їп this instance there is 
a cumulative effect if a negative error in the 
minuend tends to be associated with a positive 
error in the subtrahend, or vice versa. 

With this discussion as a background, we may 
now lay down the principle that the prime con- 
sideration in the collection of data to be used 
in determining means, or sums, or differences, 
is that they be not subject to "systematic" 
error, i.e., all prone to errors of the same 
sign. А very elementary illustration would be 
the measuring of the height of a certain age- 
group of boys with shoes on, and reporting the 
average height for the use of people unaware of 
the fact that height as reported included shoe 
heel. We would expect so obvious ап error as 
this to be caught by anyone, but many equally 
systematic, though not equally obvious, errors 
enter into economic, psychological, and other 
nonphysical measurements, ordinarily of great 
importance to social scientists. The student of 
statistics should school himself to be sensitive 
to possible sources of systematic error. 

Further discussion of this is given іп Chapters 
VI and VII in connection with the standard error 
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and the probable error of the mean. 

ТЕ two numbers having errors are multiplied, 
where is the error in the product? This question 
may be answered more definitely than that con- 
cerned with a sum or an average. Consider 


7.6 X 8.48 = 64.448 


To make the problem concrete andat the same time 
quite typical, let us consider the error to be 
l in the place starred, but ме do not know whether 
the error is positive or negative. Thus the 
correct factors and the correct product are 
given by 


(7.6+.1) (8.48+.01) = 65.373 or 65.219 or 63.675 


ог 63.525 
whereas the obtained answer is 64.448, so that 
the error is one of the following:  -.925 ог 


-.TTl or .773 or .923. The proportionate error 
in the 7.6 is =t ог .01316. In the 8.48 it is 


. 00118, and in the product it is, regardless of 
sign, .01435, or .01196, or .01199, or .01432. 
The average of these, regardless of sign, is 
.01315, which is approximately the same as the 
larger proportionate error in the two factors. 
In fact, we may in general consider the рго- 
portionate error in a product as of the same 
order of magnitude as that in the factor havinó 
the larger proportionate error. 
Consider the two problems 


316 X .315 = 99.540 
316 X .317 = 100.172 
To publish the first product as 99.5 may over- 


state its accuracy and to publish the second as 
100. may understate its accuracy. The precise 
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number of figures to publish is only accurately 
determinable by the aid of more information than 
given and than, in fact, is likely to be avail- 
able, but as a first approximation we may adopt 
the rule: Publish a product to as many signifi- 
cant Ёїбигез as in the least reliable factor. 
Publishaquotient to as many significant figures 
as in the least reliable term. The number of 
places carried in computation should be at least 
one beyond the number to be published, and if a 
long chain of operations is involved, it may 
occasionally be wise to carry computations to 
two places beyond that demanded for publication. 

It is suggested that the reader investigate 
by cut and try methods what modification or re- 
finement of this statement is needed to make it 
more accurate. Аз examples, when the proportion- 
ate errors are the same in the two factors, 
consider at least two cases: '(a) when the error 
is not in the first significant figure, and (b) 
when it is. 

The novice in experimental investigation, 
combining three statistics such as the following 


103 X .034 X 1.200 = 4.2024 


to obtain a final outcome, is likely to be un- 
critical in the distribution of time and effort 
which he puts upon the various aspects of his 
problem. Suppose the terminal statistics equals 


4.2024. Suppose .034 is gotten by using an 


instrument, for example a psychological test, 
provided by some earlier worker, and 103 and 1200 
by using newly devised instruments. Refinement 


in these, leading to measures 103 and 1200, has 
been accomplished at great labor and itis futile 


» 
in view of the error inherent in .034. That a 


44 DIGNITY OF DATA 


chain is as strong as its weakest link has been 
overlooked time and again by semi-competent sta- 
tisticians. 

Onthe other hand, in many statistical problems 
analogy with a chain is unsound. Thus, charac- 
teristically, the mean is not as unreliable as 
themost unreliable measure which has entered it. 
Here the analogy may be toa rape of many strands, 
whose strength is much greater than its weakest 
strand. 

It is sound practice so to report quantitive 
results that a figure based upon observations 
such as, say, 1.743, carries with it the meaning 
that the 3, but not the 4, may be expected to be 
in error. Suppose a finally computed statistic 
is 174328, with an expected error in the last 
three figures. Then the result should be reported 


in some such manner as 1,743 X10°. The mere 


fact that the decimal point happens to be one or 
more places to the right of the last significant 
figure does not constitute warrent for publishing 
the meaningless figures, suchas are the 2 and 8. 
An investigator should think of a computed sta- 
tistic as in three Parts, thus: 174/3|28, the 
first nearly indubitably trustworthy, the third 
substantially meaningless, and the middle part, 
of one digit only, of questionable significance, 
and he should publish the first two parts only. 

In a formula involving several terms, some of 
which are much less reliable than others, there 
15 generally little point in keeping the more 
reliable figures to as many places as they are 
significant, for the error in the outcome is 
commonly determined by the error in the factor 
having the largest error. When addition and 
subtraction are involved, the largest absolute 
error in the parts is to be considered, and when 
multiplication and division are involved the 
largest proportionate error is the error of 
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importance. À thorough discussion of this matter 
is to be found in Scarborough's Numerical Mathe- 
matical Analysis, (1930) and an excellent dis- 
cussion, with numerical illustrations, is given 
in Walker's Mathematics Essential for Elementary 
Statistics, (1934). 

And finally, as anessential part of the back- 
ground of the student of statistics, should be 
mentioned computational accuracy and knowledge 
of methods of checking the same. Many a student 
lacks computational accuracy in simple arith- 
metical processes, and few have the habit of 
checking a computation by performing it in a 
second and independent way. The idea that the 
need for such a habit has passed with the advent 
of computing machines is stultifying. The writer 
has found no operation in greater need of checking 
by the ordinary student than that of extraction 
of square root. The habit of proving, by squaring, 
the root found is simple and very effective. 

We may assert that there is need for a proper 
psychological background. One’s processes in 
arithmetic and algebra have commonly been built 
up through the vehicle of abstract numbers. Many 
of the magnitudes entering into a statistical 
study are not of this sort, while others are. 


For example, in y = Зах” it might be that 3, 2, 


b are abstract quantities, not subject to rede- 
termination or experimental change, operating 
upon the other measures and enabling them to be 
expressed in an equivalent form y. If so, the 
abstract number concept with reference to these 
is appropriate. The magnitude a might be a sta- 
tistical concept, earlier determined by oneself 
or by another investigator, and not intended to 
be changed in this experiment, but nevertheless 
an experimental constant and not of the nature 
of a pure abstract number. The magnitude x may 
be the crucial observation of this experiment. 
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As it is operated upon, every derivative from it 
retains the stigma of its origin. If the obser- 
vation x has a meaning and if the final function 
of it y has a meaning, then every intermediate 
function has a definite physical meaning, and the 
proper psychological attitude of the student is 
to trace this out step by step. It is question- 
able whether it ever happens that the meaningful- 
ness of the outcome is accurately and fully ap- 
praised if the meaning of the successive steps in 
the process has been obscure. Certain magnitudes 
that the statistician is working with are living 
and vibrant in a sense that the abstract symbols 
are not. The 3, 2, b, and a provide the form of 
expression but thelife that flows through is the 
X, and it is this life that gives primary meaning 
to each successive function that is dealt with. 
It isa social blood stream flowing throueh alge- 
braic arteries. 

Plan of study: The table of contents provides 
the plan of this text. The first ten chapters 
constitute an integrated Sequence built upon the 
background as indicated in the preceding para- 
graphs. The later chapters provide occasional 
advanced techniques, In general, in these chap- 
ters, citations to sources supplant full deri- 
vations. This text may be used as a basis for an. 
advanced course in statistics, especially by pur- 
suing the references and developing the techniques 
of the later chapters. The student studying it 
as a first course should realize that the early 
chapters give the first Steps in many sequences. 
Following upon elementary graphic work of this 
text is graphic analysis; upon a study of means 
and standard deviations is a study of other de- 
Scriptive features of a distribution; upon simple 
correlation of quantitative measures is corre- 
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normal distribution іп one variable is the normal 
correlation surface; following linear regression, 
which is equivalent to fitting a straight line, 
is curve fittingof all sorts; and paralleling an 
algebraic interpretation, running throughout all 
the preceding, is a geometric and trigonometric 
interpretation involving more than three di- 
mensions. А member of an elementary class in 
Statistics made a most stupid remark, unless per- 
chance it was a bit of wry humor. He said, "The 
work of this course gives practically all the 
statistics that a schoolman could ever use, 
doesn't it?" He might as well have observed that 
only little care in thinking is useful if one is 
engaged in a small business. 

An early step in a statistical problem is the 
collectionof data. This problem has been treated 
of in general terms (see Keynes, 1921) and, of 
course, in detail in connection with many special 
problems, particularly those of economics (see 
Zizek, 1913 and Young, 1925). Because of its 
almost infinite ramifications, depending upon the 
purpose and amenability to control of the phe- 
nomena under observation, the process is not 
Seriously presented in this text. Its study is 
most appropriate after the purpose of an investi- 
gation is clearly defined, and after physical 
limitations of time and expense are known. 

А number of terms have been used in this chap- 
ter which have teclinical statistical or mathe- 
matical meaning. Since this section is concerned 
with the appropriate background for a study of 
statistics, these, together with certain other 
equally elementary statistical terms and symbols, 
are defined and criticized herewith. Additional 
terms will be similarly presented in subsequent 
chapters. The arrangement is alphabetic except 
for such closely related terms as call for defi- 
nition together. The explanations of terms here 
given are intended to be accurate as far as they 
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ко, but not necessarily complete because it is 
thought that definitions so rigorous as to include 
the fields of meaning of higher mathematics would 
be confusing to the elementary student. 


absolute value: the value of a magnitude taken 
positively, thus the absolute value of -1 is 7, 
and the absolute value of x is (-x), if x itself 
is negative. Bars enclosing a quantity indicate 
absolute value, thus: |x|, or Tal. 


abscissa: see coordinates, 
arbitrary origin: see coordinates. 


argument: the value with which a table is 
entered when a related value, called the conse- 
quent, or tabled value, is sought. If one finds, 
from a table, the logarithmof 5, the number 5 is 
the argument, and the logarithm, namely . 69897, is 
the consequent. 


axis: a straight lineina figure with refer- 
ence to which the figure is symmetrical, or one of 
the axes of reference. See coordinates. 


hiometry: the application of measurement to 
living things. 
correlation: the tendency of paired measures 


to vary concomitantly. This term is further de- 
fined in Chapter X. 


crude score: see score, 
consegnent: see aróument. 
constant: see literal notation. 


coordinates: there are two kinds commonly 
used in connection with two-dimensional repre- 
sentation, rectangular and polar. Rectangular 
coordinates consist of two lines at right angles, 
the horizontal axis, or abscissa, and the vertical 
axis, or ordinate. The intersection of the axes 
is the origin, or zero point, from which measure - 
ment is taken. Two variables related by some 
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equation or law may be represented, one by dis- 
tance in the direction of the abscissa, and the 
other by distance in the direction of the ordi- 
nate. Paired values define a single point on the 
two-dimensional surface. The connection of all 
such points yields a curve (still called a curve 
though a straight line or composed of straight- 
line segments). If this curve is such that, for 
each value of the first variable, there is one 
and only one value of the second, the first vari- 
able is a single valued function of the second. 
If at the same time the second isa single valued 
function of the first, then the curve is a mono- 
tonic curve, —опе that is always rising or always 
falling. The singular "coordinate" refers to 
distance in either one of the two dimensions. 
The plural "coordinates" refers eitherto the two 
coordinates for some point or to the two lines 
at right angles through the origin, which are 
also called the axes of reference. If the point 
from which measurements are taken is chosen 
for convenience, or ifit has no special physical 
significance, the origin is called an arbitrary 
oriéin. Polar coordinates define a point in a 
space of two dimensions by distance from an origin 
and angular distance from a given line. This 
system is less commonly employed in elementary 
statistical work than rectangular coordinates. 


curve: see coordinates. 


degree: in trigonometry, one three-hundred- 
and-sixtieth of the circumference. The degree of 
an equation is the highest sum of the exponents 
of any term of an equation, thus, 4x + y = 7 is 


ofthe first degree, 4x* + y = T, ог 4x + xy = 7, 
of the second degree, etc. See dimension. 


dimension: a dimension of an equation or of a 
term of an equation is the same as the degree of 


it. Thus xy?z? is of the sixth degree or sixth 
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dimension. See degree. 


distance: commonly used to indicate not 
merely spatial dimension, but also dimension in 
any quantitative function. Thus we may speak of 
the distance one score is from another, though 
the function may be intelligence, reaction time, 
or whatnot. 


equivalent: equivalent measures are those 
expressing the same thought in different terms, 
thus 39.37 inches are equivalent to 100 centi- 
meters. See page 32. 


evaluation: see formulation, 


factor: in addition to the ordinary meaning 
that a factor is one of the quantities entering 
into aproduct, the term is used quite extensive- 
ly to indicate one of the components entering into 
a sum, -particularly if the components making up 
the sum are uncorrelated with each other. 


function: this is an aristocrat of mathe- 
matical terms. It is indifferent to such lesser 
terms as variable, constant, exponent, such oper- 
ations as multiplication or addition, and such 
specific relationships as subordination, corre- 
lation, association, though allof these serve it 


manner, which he who knows may designate, upon x 
yields y, and furthermore that Y is gotten in no 
other way. Distinctions which are commonly im- 


; 
single valued, or monotonic, as explained in a 


preceding Paragraph in connection with curves 
plotted upon rectangular coordinates, 


formilation: the determinationof the specific 
nature of a function. Fvaluation is the process 
of finding the numerical value of У which is a 
function of X, for a given value of x. 
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imaginary number: in mathematics the square 
root of anegative quantity is called an imaginary 
number, and is contrasted with real numbers, those 
having finite values between -® and +œ, In ad- 
dition to quantities imaginary in this respect, 
there are in statistics values which are "im- 
possible" or "unreal", such as a product-moment 
correlation coefficient greater than one, a re- 
liability coefficient less than zero, or a nega- 
tive standard deviation. То avoid confusion, 
these latter would better not be designated im- 
aginary, however much they are a matter of fiction. 


increment: a term used ordinarily to indicate 
a small value ora small change in value and quite 
commonly represented by $ or А. Such a small 
increment approaches the differentials of calculus 
as it becomes smaller and smaller. у 


invariant: this highly useful concept, wher- 
ever employed, is peculiarly valuable in con- 
nection with mathematical and statistical prob- 
lems. If the vertical, the north-south, and the 
east-west dimensions of an egg are taken as it 
is held in various positions, there result a 
great many values, all different from each other 
These dimensions are variable as transforma- 
tions,—changes in position, —are made. (This 
geometric explanation of "transformation" is in 
harmony with the algebraic explanation given, 
under the term transformation.) If the ratio 
of the longest diameter to the shortest diame- 
ter is computed for each position, it remains 
the same and is accordingly invariant under the 
transformations. These transformations may be 
considered to be merely different ways of viewing 
the same thing. If a thing is chameleonlike and 
looks different as each point of view is taken, 
there is no virtue in it, but if some aspect of 
it can be selected such that it is the same no 
matter the point of view (in mathematical terms, 
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no matter the axes of reference), it becomes a 
thing in which there is no shadow of turning, and 
in which truth resides. The discovery of the in- 
variants in phenomena is the main concern of sta- 
tistical and experimental science. 


literal notation: three sorts of elements 
enter into algebraic expression: (a) symbols of 
operation, or of relationship, (b) symbols indi- 
cating invariable features of the equation, that 
is, constants, and (c) symbols indicating vari- 
able features, that is, variables. A large class 
falling under (5) consists of the ordinary cardi- 
nal numbers. With quite a number of exceptions, 
it may be said that inthis text letters near the 
end of the alphabet, particularly x and y, will 
represent variable quantities, letters near the 
beginning of the alphabet, particularly a and b, 
will represent constants, or invariable quanti- 
ties, and symbols and Greek letters will repre- 
Sent operations. For more detail see the para- 
graph following upon symbols. 


maximum and minimum: the meaning of the terms 
will be obvious to the reader, but their im- 
portance in statistical work is so great that 
they should receive definite obeisance whenever 
met in a statistical study. To assert that one's 
purpose is to get the maximum of enjoyment out of 
life is a literary pleasantry, but to say that 
the object is to make errors minimal should hold 
the attention and vivify the interest until the 
means for accomplishing this become clear. Do 
not slip over these key words to understanding, 
these cues warning one of a chance to observe a 
bit of intellectual cleverness. The neatness of 
mathematics is incorporated into Statistics 
through these concepts. 


monotonic: see coordinates, 


observed value: this describes the value of 
a variable in distinction (а) fromthe value esti- 
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mated from a knowledge of one or more correlated 
values, or (b) from an unknown true value. 


origin: see coordinates. 
orthogonal: see space. 


percentage: a percentage is 100 times an 
equivalent proportion. If 5 out of 20 belong in 
a certain group, the proportion in this class is 
.25, and the percentage is 25, The subtlety of 
this relationship is not so great as to provide 
an excuse for the common misuse of these terms. 


proportion: see percentage. 
real number: see imaginary number. 


score: a quantitative or qualitative record- 
ing of an observation. 


Space; in addition to the familiar variety, 
the student of statistics is frequently concerned 
with an п-ѕрасє, ог an n-dimensional space, where 
n is greater than 3, because he can so readily 
represent fourormore variables each in an inde- 
pendent direction (orthogonal or at right-angles 
to each other) in this space. The mental picture 
of four lines through a single origin, each at 
right angles to the other, is beyond one, and 
the student is not advised to weary his intellect 
with trying to make such a picture. Rather he 
should manipulate his four or more variables 
according to the rules of algebra, or he should 
reason by analogy with the geometry of two or 
three dimensions. In other words, when an n-di- 
mensional proposition'is presented do not become 
panic-stricken and consign the treatment of it 
to mortals cast ina higher mold and endowed with 
a pineal eye. 

statistic: а statisticis any quantitative ог 
qualitative characterizationofa thing observed, 
or any summarized statement of such. John Poe, 
aged 12, heiéht 59 inches, is one of a group of 
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70 whose mean абе is 13.0, averase heisht 63.3 
inches, in spite of one member being but 48 
inches tall. Every item in italics, as well as 
an innumerable number of others not mentioned, 
is a statistic. Statistics is the plural of 
Statistic, as defined, and also the entire body 
of sound practice which has been developed for 
summarizing observations and discovering invari- 
ant features of them. These summarized state- 
ments, such as mean абе 13.0, do not "prove" 
points, They merely describe a situation. The 
expression "statistics prove" is always a mis- 
Statement. Statistics are their best "measure, " 
permitting him who will to "infer." We co rrectly 
say that a yardstick "measures" the height of a 
child, not that it "proves" it. 


symbols: it is thought that a mere mention 
of most of the symbols to be employed in this 
text should suffice, so that explanations of 
only a few follow: 


= equality 

= identity 

=. ог = approximate equality 

== equivalence, see page 34 

* plus 

= minus 

а АЛ < division, thus аЬ, р, а/Ь, а:Ь 

X, *, or nothing at all, meaning multiplica- 
tion, thus, a X b, or 8:* b, or ab 


© or w an infinitely large magnitude, or a 
"true" magnitude, i.e., one having no chance 
error in it. 


exponent, to indicate a power, thus 
aj = аъалаа 
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5 3 Е 
у square root, У indicates cube root, etc. 
fractional exponent, to indicate root, thus 

Los 

а? = Уа 

subscript, to specify the one of several of 
a sort, thus ху, X,, ordinarily stand for 
Score or measure on a first variable and 
оп a second. x,,, Xj,, Etc., to indicate 


score of subject a on the first variable, 
of subject b on the first variable, etc. 


Greek capital sigma, to indicate the sum 
of all the measures in question, thus Хх, 
equals the sum of all the x, measures for 
the group in question, equals 

Xy, Xip Te жа XE 
if there are n measures. 


to indicate omitted item of a series, as 
shown just above. 


)( to indicate excluded term of a series, thus, 
fij,3 -.- 114 «9 7 12.345688 

П Greek letter pi, used as a symbol of opera- 
tion indicating the product of all the 
measures in question, thus, 

mA ST NOD SW dust Cogo Odeon 
if there are n measures. 

e = 2.71828183, the Naperian base of loga- 
rithms, which is a magnitude entering into 
many formulas. 

N indicates the number of measures ina series. 
When used asa subscript Хү,» Хүү» *** Хүү» 


n may be used in lieu of N because its 
size makes it better adapted for use as a 
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subscript. 


transformation: the substitution of a second 
variable for a first of which it is a function. 
generally a linear function. Thus, if relation- 
ships involving x are given, and in lieu thereof 
relationships involving. y are determined, where 
х-=а+Ъу, a linear transformation has been made. 
See invariant. 


, transmutation: used synonymously with trans- 
formation, whichis the mathematical and approved 
term. No alchemical implications are involved. 


variable: see literal notation. 


SECTION 5. PLAN OF STUDY 


If the reader has looked upon statistical 
method asatool to be resorted to when necessity 
dictates, it probably holds a very lowly status 
in his mind. In one sense this is correct, for 
statistics is definitely a servant to the socio- 
logical, biological, and politeal sciences, but 
a very independent servant in his methods, a 
Servant who cannot be ordered to "prove a point," 
but only to investigate it; a servant who re- 
peatedly tells his Master, the setter of the 
problem, that he is on the wrong track, that the 
problem is incapable of solution with the data 
at hand, or that refinements of procedure called 
for ifasolution is tobe obtained are far beyond 
those anticipated. 

As an adviser of graduate students it has been 
the writer’s experience to have "worthwhile" 
issues presented, and time and again with no, 
little, or inadequate knowledge of what is implied 
in reaching a solution. As one of many examples 
he could cite the case of the student who wished 
to prepare himself to make analytical diagnoses 
of abilities of pupils and to give them sound 
educational and vocational guidance, but who had 
no slightest ideaof studyingor utilizing partial 
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or multiple correlation. To such a student sta- 
tistics asserts its independence, demands tech- 
niques, defines the limits of undertakings as 
dependent upon data and treatment, and finally 
pontificates in the matter of reliability of 
diagnoses. Statistics is the servant in the 
house, but, whether known or unknown to the mas- 
ter, is guided by rules of conduct so full of 
virtue that they may not be overstepped. 

The student who really attains a frameof mind 
that is sympathetic to statistics must find 
satisfaction in neatness, dispatch, soundness, 
and generality of procedure, not satisfaction 
only in the proofofa cherished belief. In fact, 
the first-mentioned satisfactions should be so 
great that a disproof of a belief carries but a 
minor sting. 

The usual elementary text in statistics seeks 
to acquaint students with the techniques neces- 
sary to obtain a numberof highly valuable summa- 
rizing statistics: averages, measures of varia- 
bility, a fewmeasures of correlation, and mastery 
of some of the elementary graphic devices. This 
text considers these things as a secondary aim, 
the primary aim being to acquaint the student 
with the logic back of these procedures. It is 
believed to be more important for a student to 
understand why a mean is employed than to know 
the steps for calculating it; why and when a 
product-moment correlation coefficient is signifi- 
cant than to master the intricacies of the xlnt 
correlation chart. This broader purpose necessi- 
tates a different and a more-extended treatment 
of so-called elementary issues than is usual. It 
is accordingly here attempted to cover in the 
earlier chapters a smaller number of different 
statistical concepts than are usually covered in 
an elementary book, but to do so with greater 
than usual logical completeness. 

In that this text is designed to serve as an 
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introduction to advanced stndy, the notation used 
is that appropriate to advanced Statistics. For 
example, it would be quite possible to write an 
elementary statistics without recourse to letters 
in the Greek alphabet. This, however, is not 
here attempted, for Greek letters will not only 
Serve in the elementary problem but in addition 
will provide familiarity with concepts essential 
in more advanced statistics. Special care has 
been taken to provide the Simplest notation 
congruent with prospective needs of the student. 
This occasionally, but only occasionally, leads 
to a notation which is a trifle more elaborate 
than the immediate problem demands. 


PROBLEMS 
Problem 1. 
As a vocabulary review the Student may test 


his understanding of the following words and 
phrases, 


abscissa distance 
absolute value equality 
arbitrary origin equivalent 
argument evaluate 

axis factor 
biometry finite 
central tendency formulation 
consequent function 
constant identical 
coordinate identity 
coordinates imaginary 
correlation increment 
crude score indeterminate 
curve interpolation 
degree invariable 
determinate invariant 
diameter line 
dimension. mathematics 


gat 


appropriate figure in the right-hand members of 
the following equations to indicate the most 
probable location of the errors in the answer 
consequent to errors as designated by stars in 


the 


9. 
10. 


PROBLEMS 


maximum positive 
minimum real number 
monotonic reduce 
multiple valued score 

function single value 
negative function 
ordinate solution 
origin space 
origin, arbitrary statistic 
polar coordinates sum 
Problem 2. 


With the general principles covering the ргора- 
ion of error in mind, place a star over the 


terms of the left-hand members. 


123.5 + 11.62 = 135.12 11, 108,8 13.0 = 36.2 
154.42 + OIÑ = 154.434 — 12, .0747 $ .1245 = .6000 


95.75 - 13.4 = 82.35 13. 192 $ 1,6 = 120.0 
67.805 - 13.60 = 54.205 14, 273 i .007 = 39,000 
16.10 + 5.300 = 21.400 15, (2.7)? = 7.29 

103 X .305 = 31.415 16, (1,5)? = 3,375 

41.5 Х.014 = „5810 17. (.0012)? = „00000194 
.07 X 8.6 = .602 18. 4225 = 15.00 

20 X 15.0 = 300.0 19. V/3.61 = 1.9000 
.007 Х..44 = , 00308 20. //.000529 = .02300 


These exercises have been given because of 
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the common error of recording far more figures 
than are significant. 

The following problems in plotting mathemati- 
cal equations range from very simple to diffi- 
cult, though these latter (Problems 7 and 8) 
involve nothing beyond the handling of logarithms 
and negative exponents. 


Problem 3. Plot the straight lines 
(a)y = 4 + Зх 
(b) 2y = 4 + 3x 
Relate the slope of each line as shown graphi- 
cally with the coefficients of x and y. 
Problem 4. Plot the second degree parabola 
y = 1 + 3x? + x3, which may be written 
y-5 = (x-1)(x +2)? 
Relate the graph to this second way of writing 
the equation. 


Problem 5. In Table ХУ С the proportion of 
cases in a unit normal distribution lying to the 
left of the corresponding x-value is p. The 
equation relating p and x is 


X l -x?/2 
p= [ z dx, in which z = V3 e 


-0 


Since the related values of p, z, and x are given 
in Table XV C no computations are necessary in 
making the following graphs (a) z as a function 
of x, and (5) p as a function of x. (a) is the 
highly important normal distribution and (5) is 
the graph of its integral. 


Problem 7. The simple logistic is an im- 
portant growth curve. Its equation іп the general 
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form is 


e k y k 
Е и 


Dr. Marian Wilder has determined the constants 
k, a, and b, for the logistic which fits аз 
closely as possible (in the least squares sense) 
to the integral of the unit normal distribution. 
The equation is 
SUR AE Dp Ж 
Y i + en 1s 637% 


Plot this and compare with (5) of Problem 6. 


Problem 8. The Gompertz curve 


is also a growth curve. Dr. Marian Wilder has 
determined the constants a and b for the Gompertz 
curve, which fits as closely as possible (in the 
least squares sense) to the integral of the unit 
normal distribution. The equation is 


у = 1.9019:3-15007” 


Plot this and compare with (5) of Problem 6. 


СНАРТЕВ 11 
STATISTICAL SERIES 
SECTION 1. TYPES OF DATA 


Logically we may classify series as follows: 


I. Temporal: (a) qualitative or (5) quantita- 
tive differences, shownas time changes. 


II. Spatial or Geographic: (a) qualitative or 
(h) quantitative differences shown as 
location in space changes. 


III. Momentary (or Pseudo-momentary) : (a) quali- 
tative or (h) quantitive differences 
shown as the items in a sample chanee. 


Inder pseudo-momentary fall the large class of 
observations which ıt is assumed have not been 
altered by the particular moment or place of 
observation. For example, frequently the records 
of a week ago, a year ago, or of today, of a 
chemist may all be thrown into a common distri- 
bution, believing that the moment or place of 
observation has been immaterial to the issue at 
hand. 

The following naming of series has been here 
adopted: 

I. (а) Qualitative-temporal series. 

(5) Temporal series. 
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II. (а) Qualitative-spatial, or qualitative- 
geographical series. 

(b) Spatial, or, if related to location 

upon the earth, geographical series. 


III. (a) Qualitative series 
(b) Quantitative series. 


Let us consider these in more detail. 


Temporal series: If a single thing, say a 
child, is observed at successive periods of time 
with reference to a single character (the term 
preferred by biologists), or trait, or character- 
istic, a time series results. Usually the suc- 
cessive measures are quantitative, as would be 
the caseifheight is measured, but they need not 
be. Thus if the successive observations of an 
infant's vocalization give the syllables blah, 
ma, bab, omp, these constitute a qualitative time 
series, as the differences in the trait, as time 
varies, has been the issue. 

Table II А is another illustration ofa quali- 
tative-temporal series. Аз 18 generally the case, 
it is here obvious that fuller information would 
disclose quantitative phenomena underlying most 
of the qualitative statements listed. 


TABLE II A 


IMPORTANT STEPS 
IN THE AUTOMOBILE INDUSTRY* 


1900 "Mass production" of motor cars first employed 
1901 First American speedometer made 

1902 Alloy steels used in making automobiles 

1904 Head lamps included as standard equipment 

1906 Front bumpers and electric horns pioneered 
1908 Left-hand drive popular 

1909 First closed bodies built 


* AUTOMOBILE FACTS, 11, No. 2, October, 1939. 
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1911 


1912 
1915 
1915 
1917 
1918 
1919 
1920 
1921 
1922 
1924 
1925 


1926 


1927 
1928 
1929 
1931 
1932 
1933 
1934 
1935 


1937 
1938 


1939 
1940 


STATISTICAL SERIES 


TABLE II A 
(CONTINUED) 


Electric starting, lighting, and ignition first 
combined in a single system with storage battery 
Windshields made as part of body by one company 
Top and windshield became standard equipment 
Steel frame body introduced 

Disc wheels appear 

Tire sizes standardized 

High-pressure chassis lubrication adopted 
Windshield wiper universal 

Hydraulic brakes on some cars 

Balloon tires appear 

Bumpers became standard equipment 

Four-wheel hydraulic brakes and all-steel 
bodies used 

Rubber engine mountings, rubber spring shackles, 
and adjustable front seats introduced 

Gearshift standard on all cars 

Safety glass introduced as standard equipment 
Automobile radio introduced 

Vacuum spark control used 

Drop-center rims replaced "detachable" 

First steel top used on some cars 

Synchronized front and rear Springs introduced 
All steel bodies universal. Steering-post 
gearshift on one make 

Built-in windshield defroster ducts 

Improved windshield vision, independently 
sprung front wheels attract attention 
Steering-column gearshift becomes popular 
Sealed-beam headlights universally approved 


If one child at age 18 months is measured for 
some trait, a second child at аре 24, and a third 
at age 39 months, etc., there results a series 
(a pseudo-time series) which is not a true time 
series, for no single principle of sampling has 


————————— 
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been employed, tbat is, the successive items 
represent differences with reference to (а) time 
and (b) subject. "hen such series are treated as 
time seriesit is upon the assumption that differ- 
ences in (b) are immaterial. This. is generally 
a hazardous assumption and should call for spe- 
cific study before being made. Most achievement 
and intelligence test norms which are intended to 
be time series or growth curves in the function 
in question involve this assumption, for it has 
seldom been attempted to base norms upon the same 
subjects as they age from year to year. It is 
commonly, for example, assumed that the thirteen- 
year-olds fairly represent what the twelve-year- 
olds will become in a year's time. Expediency is 
the excuse for this method of approximating true 
time and true growth curves. 

If the trait measured at successive periods of 
time is a developing one, it may be called a 
growth series, and, when plotted, a growth curve. 
A growth curve generally starts at a point near 
zero. It is not to be expected that the exact 
starting point can be caught; even if the start 
in the case of a living bisexual organism is 
called the moment when the sperm impregnates the 
ovum, still the point is arbitrary, for both 
sperm and ovum have had antecedent histories. 
Starting with an initial measure, near zero, the 
typical growth curve increases, with first in- 
creasing and later decreasing acceleration, to 
some physiological maximum at a date which may be 
fully as difficult to determine as the zero point 
date. 

In contrast to this typical growth curve is 
the time series characterizing amuch later stage. 
Ап economist may not know the time and correlative 
price of the first bushel of wheat. The starting 
point is lost in the remote past, but the fluct- 
uation of the price of wheat from day to day and 
month to month, as time is now changing, yields 
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а very typical time series, but it is no longer 
called a growth series. 

Typical issues arising in connection with time 
series are not the same as for other series. In 
Chapter V it is pointed out that the essential 
issues, the essential technianues, and the es- 
sential summarizine statistics, all change as 
type of series changes. 

Аз a second type of series тау be mentioned 
the geographic Series, or, to use the more com- 
prehensive term, the spatial series. In this 
case differences in time are either (а) avoided 
bv collecting all items at the same time, or (5) 
considered irrelevant, but locations from which 
the data are collected are of prime importance. 
If the material collected has been distributed in 
three-dimensional space, the very general spatial 
Series results. An illustration of such a series 
would be a survey of the dust content of the air 
in the different city precincts and at different 
elevations above street level. Problems involving 
data of this three-dimensional sort are becoming 
more common because of air and submarine trans- 
port. А not too large portion of the surface of 
the globe may be considered a plane, so that the 
ordinary spatial series involves differences in 
locusof items in two dimensions onlv. The series 
is then called a geographical series. Fach term 
collected is uniquely and intrinsically associated 
with the point of origin. If this fact is lost 
in any Step of procedure, the series immediatelv 
ceases to be a geoeraphical series. If in a plot 
of ground laic off like a checkerboard, one unit 
of area reveals 2 cut worms, a second unit ?3 
ants, a third 1 elow worm, etc., these successive 
qualitatively different items attaching to the 
differences in location constitute a geographical 
series. The more ordinary sort is found when the 
differences revealed are quantitative, as would 
be the case if one unit of area, perhaps an entire 
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township, yielded 10, 000 bushels of wheat, a second 
zero bushels, a third 5,000 bushels, etc. The 
essential statistics of geographical series are 
even more specialized than are those of a time 
series. 

The map is essential in portraying spatial 
series. Many spatial series show both qualitative 
and quantitative differences, in which case a con- 
siderable ingenuity is needed to devise a map with 
cross sectioning, or color scheme, to portray the 
essential facts. Spatial series are intrinsically 
more amenable to graphic treatment, and less to 
numerical treatment, than temporal or quanti- 
tative series. The maps of the U. S. Coast and 
Geadetic Survey, of the Weather Pureau, nnd of the 
Census Bureau show the high degree of completeness, 
varietv, and detail of portrayal that is possible 
The groupings of territories in spatial series aad 
the subdivision of areas may follow conventional 
procedure or the peculiar needs of the problem. 
The order adopted by the Census Rureau in giving 
population statistics is shown in Chapter IV, 
Table S. 

The third and fourth types of series are the 
qualitative and the quantitative which exist when 
the principle of sampling has either allowed for 
time and space by making them constant, or when it 
is considered that such differences in time and 
space, at which the successive observations have 
been made, are immaterial. Thus some important 
consideration other than, or in addition to, time 
or space has been utilized as the principle in 
getting the cases of the sample, and of course 
some definite principle has been followed in making 
the observations or taking the measurements of the 
case. If the resulting characterizations of the 
observations taken differ qualitatively one from 
another, the series is a qualitative or categorical 
series, whereas if they differ in terms of amount 
it is a quantitative series. Observations upon 
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eye color, or hair color, or blood type, or sex, 
or abnormality of fifth-grade school children, 
would yield qualitative series, whereas recording 
the height, or weight, or age, or psychological 
test score, or achievement test score, etc., would 
yield quantitative series. One immediately deduci- 
bíe property of the items of a quantitative or 
qualitative series is that they are not tied to 
specific time and place of origin, that is, these 
are neglected when studying them. 

Thus in every instance and for every sort of 
series the observations taken cover but a fraction 
of reality. If numbers of bushels of wheat are 
the elementary statistics, the animal life in the 
unit of area is not. If heights of twelve-year- 
old boys are the elementary statistics, their 
places of birth and dates of birth, except as 
such might define the principle of sampling 
twelve-year-olds, are not. And so it goes. We 
are continually studying aspects of total situ- 
ations, and it must be so, whether the method of 
study is admittedly statistical or not. 

When a research undertaking is planned, it 
will be found conducive to later precise treat- 
ment if an attempt is made to relate the issues 
to be studied to some one of the four types of 
series. ТЕ this is impossible, the statistical 
handling of the resulting data becomes complex, 
though not necessarily insoluble. 

The classification here followed of simple 
series as being temporal, geographical (or, more 
generally, spatial), quantitative, and qualita- 
tive, is not identical with that made by certain 
statisticians: Yule (1929) divides statistics 
into the statistics of attributes and the sta- 
tistics of variables, These correspond to quali- 
tative and quantitative, as herein given. Jerome 
(1924) divides series into historical and cross- 
section, and the latter he divides into geo- 


TYPES OF ISSUES 69 


graphical, qualitative, and quantitative, thus 
agreeing with the classification herein used. 
Another common classification is into discrete 
and continuous series. This is made by Garrett 
(1926), Thurstone (1925), Rietz (1924), and Young 
(1925). R. A. Fisher (1934, page 25) classifies 
certain series as temporal and again he classi fies 
them (page 4) as continuous and discrete. It 
should be noticed that quantitative and qualita- 
tive are not synonymous with continuous and dis- 
crete. The initial data in quantitative series 
may be continuous, аз, for example, heights of 
twelve-year-olds, or discrete, as, for example, 
the numbers of children in families; but the 
initial data in qualitative series can only be 
discrete. The merit of any classification is in 
its utility. In general, the reader will find 
that a definite set of statistical techniques 
follows from the type of series involved. This 
is true for the four types here employed. If the 
quantitative 15 further subdivided into continuous 
and discrete, it will be found that there are 
separate statistical techniques which are in har- 
mony with each. 


SECTION 2. TYPES OF ISSUES 


We have considered the types of data occurring 
in the different series. Let us now inquire as 
to the differences in issues consequent to differ- 
ences in series types- 

In time series one is concerned with fluctu- 
ation of some single function with changes in 
time. If the function is greater at one timé 
than another some measure of this excess is im- 
portant. If a natural zero point, or upper limit, 
such as a 100 per cent point, is not available, 
then some gross measure of difference must be 
employed, but in the very common case where one 
or bothof these natural limits are known (limits 
in terms of the amount of the magnitude, not 
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limits in terms of the time at which the zero 
amount or 100 per cent amount is attained), the 
obvious measure of relationship is the ratio of 
the magnitude of the function at one time of its 
magnitude at another, and since the zero magni- 
tude at the beginning of growth is never an ob- 
served value, one does not secure infinite ratios. 
Should one secure an infinite ratio because the 
magnitude at some time after the start becomes 
zero, then the ratio as a statistic loses its 
value. In general, the ratio is a statistic 
peculiarly adapted to the study of temporal 
series. 

The price of a commodity at a given date, 
divided by the price at a second date, is such a 
ratio. If a second commodity for the same two 
dates is involved, a second price ratio is ob- 
tained. Composites and aggregates of such ratios 
or relatives are called (price) indexes, and 
these are essential tools for the study of time 
series. 

If the price in 1934 is expressed relative to 
the price in 1933, the year 1933 is called the 
basal year. The choice of such basal moments of 
time with reference to the magnitude at which 
time all other magnitudes are expressed is es- 
sentially a problem of time series. 

There is frequently evidence of а Systematic 
change in magnitude over a considerable period of 
time. This is called a trend, and a line indi- 
cating it when the series is plotted is a trend 
line. The trend is an essential statistic of 
many a time series. 

There may be a regular order of fluctuation 
as time changes, yielding cyclical phenomena, and 
a study of these and their periods are peculiar 
to time series. Any time series of appreciable 
duration (in studying etheric vibration .001 of 
a second would be a very appreciable duration) 
may be expected to show periodic fluctuations. 
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Аз a consequence one of two procedures is neces- 
sary, dépendent upon whether it is desired (a) to 
study the changes within a certain cyclical 
period, or (b) to study trends independent of 
such periodic changes. Illustrations will make 
the problem clear: 

(а) let it be required to ascertain the nature 
of the load of an electric-power generating plant 
during a twenty-four-hour period. The current 
consumed per hour for some one day could be tabu- 
lated or plotted. The result would have only 
such accuracy as would result from a single day's 
sampling. То obtain a more reliable picture, а 
number of days could be combined and the tabu- 
lation made showing the average load for each 
hour of the 24. Obviously error might creep in 
here, for the load on a Monday would be quite 
different from that on a Saturday or Sunday, and 
perhaps different from that on the other days of 
the week. With due allowance for holidays, proba- 
bly avery satisfactory idea of the hourly fluctu- 
ations of the Monday load could be obtained by 
pooling results for several Mondays. Differences 
in daylight, temperature, etc., would make it 
unsoutd to combine all the Mondays in the year. 
The problem cited is typical of temporal series 
problems, and fhe principle that should guide 
one in pooling results should be to group as 
wide a range of data as are typical with respect 
to the characteristic under investigation, but 
not affected by other seasonal or systematic 
tendencies. 

(b) let it be required to ascertain the nature 
of the seasonal fluctuations of the load. In 
this case a tabulation by weekly units would be 
the best, as this would completely suppress both 
Saturday and Sunday and hourly idiosyncrasies. 
With this in mind, it is seen that a tabulation 
by six or eight-day or monthly periods would not 
Бе аз satisfactory as weekly or bi-weekly periods. 
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The principle to follow is to use such a temporal 
unit as equals or is an inteáral multiple of the 
period within which occur the tendencies which 
it is desired to suppress. 

Without claiming that ratios, periods, trends, 
etc., do not occur, or are unimportant, in other 
than temporal series, still it should be noted 
how peculiarly intimate is the statistic of im- 
portance and the series type. This intimacy is 
apparent whether graphic or algebraic treatment 
is involved. 

The geographic series is much less amenable 
to algebraic treatment than the temporal, because 
the issue is connected with a place, not with a 
date which may be represented by a one-dimensional 
time continuum. An equally simple and well-known 
algebraic representation for geographical po- 
Sition is not available. The tightness with 
which the issue is connected with geographic 
position is the crux of the geographic series. 
A lessening of the bond should never be attempted. 
Accordingly, the two-dimensional map becomes the 
basic instrument inthe treatment of geographical 
series data. The richness and accuracy with 
which data may be Presented upon a map is nicely 
shown by the maps of the Coast and Geodetic 
Survey. No parallel neatness in algebraic treat- 
ment is available, but one statistic, the center 
of population, based upon a two-dimensional 
distribution of cases, has been worked out alge- 
braically, thus becoming a statistic which is 
unique to the study of geographical series. 

The typical qualitative series is obtained 
when alternative characters appear as a result 
of a single sort of sampling. Thus, if we ascer- 
tain and record the nationality of each person 
passing through a turnstile to see a baseball 
game, a number of nationalities, —qualities, 
categories, —would be recorded with varying fre- 
quencies in each. These qualities are not es- 
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sentially connected with moments of time or po- 
sitions in space, for, though noted at a specific 
time and place, they maintain after the subjects 
have gone to a six o'clock dinner 

Furthermore, there is no obvious quantitative 
relationship between nationalities. What has one 
learned Бу such a tally? The numbers in each 
category (a term used only when rubrics are dis- 
crete) or class (a term used when rubrics are 
either discrete or continuous). The numbers or 
the relative numbers in each class and the re- 
lationship of these from class to class are the 
only issues involved. Such a concept as the mean 
nationality does not enter in. What is the mean 
nationality of three Germans, four ЇгїзЇтеп, two 
Frenchmen, one Japanese, and seven Americans. 
The essence of qualitative data is that the items 
in it are not differences in amount on a single 
cont inuum. 

To the biologist the concept of gene or the 
allelomorph is essentially that of the qualitative 
series antecedent to the mapping of the chromo- 
some, after which these become concepts of a one- 
dimensional spatial series. This nicely illus- 
trates a common characteristic of qualitative 
series. The items frequently are qualitative 
because of inadequacy of one's knowledge as to 
spatial or quantitative relationships between 
classes. One of the most fruitful lines of at- 
tack of qualitative data is the attempt to dis- 
cover a sense in which it is not qualitative. 
However, if no such discovery is made, there 
still remains the possibility of study of the 
series through the statistics essential to this 
type, namely, frequencies in a class, proportions 
in a class, and ratios of such proportions. 

Lastly we have quantitative series, always 
presenting two dimensions capable of quantitative 
study: (a) different amounts of some single 
thing, and (b) frequencies of occurrence of these 
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different amounts. In the study of such series 
we obviously have the possibility of all the 
issues connected with qualitative series, for, 
neglecting the quantitative relationship of the 
variable in question, we have data recorded 
in classes so that al! qualitative series tech- 
niques apply without any modification whatsoever. 
The beauty of the quantitative series lies in 
that a much greater variety of problems presents 
itself, a much richer, more exacting, and a more 
descriptive set of statistics are available tor 
the analvsis of relationships inherent in the 
data. А typical quantitative series would result 
if the scores of a certain class оп a certain 
achievement test are gotten. А record of each 
Score, together with the number of pupils re- 
ceiving it, constitutes the basic series. 

Life's problems do not confine themselves to 
single series, and certain methods have been 
developed for handling problems which are com- 
plexesoftwo or moreof the four types mentioned, 
but it is well to recognize that in general the 
problem and the method are functions of а single 
series. 


SECTION 3. TYPES OF STATISTICAL PROCESSES 


The various processes of statistics may be 
listed in the order in which they take place: 
(a) definition and analysis of purpose, (h) col- 
lection of raw data, (с) tabulation of raw data 
(4) computational Procedures, (e) presentation 
of results through the employment of summarizing 
statistics or graphic devices. 

(a) Definition and analysisof purpose:  Ante- 
cedent to the initial collection of facts is the 
total life experience of the investigator in the 
light of which he has defined his problem with 
such detail that the hypothesis upon which he is 
building will be Supported if the data vield 
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certain statistics, —or, more rarely, but gener- 
ally with greater scientific insight, he has 
multiple hypotheses, each in turn associated with 
different or alternative statistical outcomes. 
(b) The collection of data: The greatest care 
must be exercised to insure that the sample dealt 
with is "fair." It, of course, should be chosen 
because of some principle of sampling in mind, 
and the care should be that this, and not some 
unthought-of basis, is actually operating. This 
question may always be reduced to that of the 
comparability of the sample chosen, as observed, 
measured, or tested, with samples to be selected 
and measured in the future, or with samples that 
have been selected and measured in the past. The 
word "fair" implies a comparison. Thus, not one 
sample but at least two, and generally more, 
should be in mind, —on the one hand the immediate 
sample which the experimenter is about to use, 
and on the other hand the sample or samples with 
which comparison is to be made. Between a sample 
and its counter-sample, or contrast-sample, or 
over-and-against-sample, all conditions should be 
as nearly identical as possible except the one 
whose influence is being sought, that is, except 
the one under investigation. The existence of 
this counter-sample is inherent in all problems, 
though not specifically stated. Suppose one's 
expressed purpose is to find the mean 16-year-old 
level ofaccomplishment upona certain psychologi- 
cal test. Assuredly the value in doing this would 
be because it enables comparison with other groups 
or individuals. If this counter-group is of 15- 
year-olds, then they should differ fromthe first 
group in that respect only, and not in such sun- 
dry other respects as sex, nationality, etc. The 
burden of insuring similarity between these two 
groups should not be entirely or mainly upon the 
shoulders of the person collecting the second 
group. It is essential at the time of the col- 
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lection of the first sample that the nature of 
subsequent contrasting samples be in mind, and 
the more definitely the better. It is incumbent 
upon the person collecting the first sample to 
define it completely and fully. Failure to do so 
is irremediable, for no care exercised by the 
collector of a subsequent sample can allow for 
undefined conditions in the first sample. 

A different type of ability is called for to 
insure the proper collection of data from that 
necessary to its subsequent evaluation. For 
example, in securing scores upon a psychological 
or achievement test the collector should be 
sensitive to individual differences of his sub- 
jects which are abnormal, or more accurately, 
other than those assumed to hold. He should be 
sensitive to any unusual conditions under which 
the test is being administered, suchas conditions 
of light or state of discipline, and he should 
be sensitive to temporary attitudes or conditions 
of the subjects which are unusual. None of these 
types of sensitivity are called into play in the 
computational work which follows. Tt should not 
be assumed that because a person is a good sta- 
tistical computer he is also a good collector of 
data. 

(c) The tabulation of data is usually a tire- 
Some and routine undertaking. There is, in the 
first place, the assembling of the collected 
quantitative and qualitative items. Though the 
items have certain definitive individual charac- 
teristics, such, for example, as the name of the 
subject measured, the specific time and place 
gotten, these are usually considered irrevelant 
to the issue, andarenot retained in tabulation. 
Just how much should be retained and how much 
discarded should not be decided upon the basis 
of the immediate needs of the investigator, but 
rather in the light of presumable or probable 
interests and needs of future investigators. 
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Investigators generally err by publishing too 
little original data. Examples of such failures 
are frequently present when sex of subject, or 
nationality, or even age, is not tabulated and 
made available to the reader, though each of 
these items is a part of the original record. 

There should be an initial tabulation of raw 
data which includes, in addition to the items 
which will be used by the investigator, such 
other items as are available and as may be perti- 
nent to the problem of workers in related fields. 
Then follow intermediate tabulations, choosing 
and organizing data for the purposes of the in- 
vestigator. And, finally, there are tabulations 
of the findings,—terminal statistics,—of the 
study. Questions connected with these various 
tabulations are treated at greater length in 
Chapter III. 

(d) The statistical processes involved in the 
evaluation of data range from such simple opera- 
tions as making frequency tables and plotting 
distributions, time curves, etc., to the involved 
algebraic undertaking of deriving appropriate 
formulas and standard errors for the particular 
statistics being dealt with. Most of this text 
is concerned with the issues connected with the 
algebraic treatment and evaluation of data. 

(e) The presentation of results: № matter 
whether the magnitude shown is represented by a 
distance (graphic portrayal), a numerical amount 
(quantitative series), or represents a frequency 
in a class (qualitative series), there is always 
(1) some stated or implied counter-magnitude to 
which it is compared, and (2) some degree of 
credibility of the difference hetween the magni- 
tude and its counter-magnitude is reached. The 
most precise way in which this credibility is 
stated is in terms of the probability that a 
difference as great as that observed could have 
arisen as a matter of chance. Concepts (1) and 
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(2) are implicit, when not explicit, in every 
statistical evaluation. For example, to say that 
a child's mark in reading is 74 becomes meaning- 
ful only when some such added concepts are present 
as, first, the child is a fifth-grade pupil in a 
school wherein the passing mark is 70. There is 
also needed some idea of the variability of 
fifth-grade pupils in their ability; and, finally, 
the idea that the 74 is not a perfect represent- 
ation of the ability of the child in question. 
To precise concepts of the sort mentioned should 
adhere a further precise concept,—that giving 
the probability that 74 is sufficiently above 70 
that chance does not account for the excess. The 
method of reaching a precise statement of proba- 
bility cannot be given at this point, but it 
should be obvious that some such concept, precise 
or indefinite, is inherent in the interpretation 
of the trustworthiness of, and therefore the 
significance of, every statistical difference. 
The reader should develop the habit of appreci- 
ating that an answer in terms of probability 
exists for every observed difference hetween two 
statistical measures, even though he does not 
have the necessary ability, or perhaps data, to 
arrive at a precise statement of it. Just as the 
error in statistical statements is ubiquitous, 
so should be the awareness by the student of this 
fact. 

If the differences are great, as, for example, 
would be the case if the heightof а normal adult 
man is compared with that of a small child, and 
if the fallibility of the measures is small, as 
would be the case if the heightof each is gotten 
by precise methods, then the chance that the man 
is not truly taller than the child but merely 
Seems so, due to the inaccuracy of observation, 
is of the order, perhaps, of one in a million. 
The first concept is that of the difference, and 
the second that of its trust-worthiness. Here 
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the second assumes a minor role, but the reader 
should note that even here it 18 present. Further, 
it is important to note that differences of this 
obvious sort are such as society has long recog= 
nized and become adjusted to, and they thus pos- 
sess no particular interest. The interesting 
issues quite invariably attach to differences of 
such amount that a certain element of uncertainty 
as to trustworthiness persists. The crude tech- 
niques of the remote past have established such 
facts as that adults are taller than children, 
but it has remained for refined procedures, care- 
fully expressing differences in terms of the 
chance that they may have arisen as a matter of 
chance, to indicate that, for example, the visual 
imagery of children is more variant than that of 
adults. The former issue,—adults taller than 
children, —is so well established that it has 
ceased to Бе а problem, while the latter is still 
a matter of interest, and thinking about it in- 
volves both the concept of a difference and the 
certainty or trustworthiness with which it is 
established. These joint concepts are the two 
terminal points of statistical procedure. 


SECTION 4. THE SERVICE RENDERED BY ELEMENTARY 
AND BY ADVANCED STATISTICS 


The evaluation of data may involve the range 
of statistical processes fromthe most elementary 
to the most abstruse. There is on the one hand 
a resort to statistics to make specific and 
quantitative a concept such as an average, and 
on the other hand to elucidate a relationship 
which the investigator no more recognizes as 
statistical thanhe does concepts of color, musi- 
cal harmony, or beauty of form. Оп the other 
hand, concepts may be developed de novo from the 
data at hand and from a consideration of mathe- 
matical relationships which were not in the mind 
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of the investigator at the initial stage of his 
study. The making specific and the testing out 
of these concepts represent a functioning of sta- 
tistics in its more powerful aspects. The ele- 
mentary student may well devote his effort to 
finding appropriate mathematical or statistical 
expressions for concepts or issues which have 
been suggested to him from nonstatistical sources. 
This calls for ingenuity and leads to a develop- 
ment in thinking. Аз greater mastery and confi- 
dence in adapting statistical techniques to 
questions in mind develop, the student may find 
a reverse process occasionally takes place, and 
that issues are suggested by mathematical and 
statistical relationships discovered, and that 
these issues demand not only careful logical 
analysis, but quite frequently the collection and 
evaluation of new data. When this is the case, 
Statistics is no longer a tool but also a source 
of inspiration. Derivations of fundamental formu- 
las аге the essential elements in training in this 
latter stage, but they will play a minor role in 
a first course. It is tobe hoped that the reader 
will be annoyed by the paucity of derivations in 
this text, for thatisa sign that the subject is 
assuming a place in his processes of mental 
analysis and is not merely a set of mechanical 
rules of operation. 

The purpose of classification of series into 
types may not be apparent to the student at this 
stage of his study. As the study of graphic por- 
trayal, averages, measures of variability, fre- 
quencies and proportions in classes, etc., pro- 
ceeds, it will be found that there is a chain of 
dependent techniques and of terminal statistics, 
graphic or arithmetic, which are consequent to 
the type of series dealt with. Thus, as soon as 
the series is classified as of a type, the issues 
whichare presumably of importance and the techni- 
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ques for working them up and presenting them are 
suggested. 

Tt is true that much data may be classified 
in more than one way. For example, the achieve- 
ment scores of pupils of different ages in differ- 
ent schools of a city may be classified as a 
quantitative series, a geographic series, or a 
temporal series. If the first, one variable is 
score and the other is frequency (age and school 
being neglected); if the second, one variable is 
school location as given on a map, and the other, 
achievement score—or the other might be number 
of pupils of a given age range or score range; 
if the third, the assumption is made that those 
of a given age are the same sort of individuals 
so far as achievement is concerned as those one 
year younger will become a year hence. Due to 
this assumption, this should be called a pseudo- 
temporal series, but the issues of interest, 
mental growth, etc., will be the same as those in 
a true temporal series. These three ways of 
looking at this simple body of data do not exhaust 
the points of view that may be taken. The student 
should realize that the series type into which 
the data fall is nearly as much a question of 
point of view from which viewed as it is a 
question of the intrinsic structure of the data. 
When data may be viewed as of more than one type, 
it frequently occurs that one or more of these 
ways of looking at it are more or less trivial, 
raising and leading to the analysis of no im- 
portant questions. 

A more detailed classification than into one 
of four is useful. The following has been made 
because in each instance one or more unique 
characteristics of treatment, or of evaluation, 
are connected with the type in question. 


Temporal 
T-t = temporal series in which increases 
and/or decreases, not necessarily 
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monotonic, are present. Include here 
T-t-r and T-t-i, temporal series 
involving ratios and indexes. 

T-c 7 temporal series in which periodic or 
cyclical phenomena are present. 

T-g = temporal series in which growth (mono- 
tonic) is present. 


Spatial 
G-c = geographic series wherein there is 
continuous variation in the variable 
from one geographic location to а 
neighboring location. 
7 geographic series in which there is 
discontinuous variation їп the vari- 
able from one geographic location to 
a neighboring location. 
spatial series or one based upon lo- 
cation in three-dimensional space. 


Q 
Ц 
2. 
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Oualitative 
01-{ = frequencies in qualitatively differ- 
ent classes. 
01-т = magnitude of some function for quali- 
tatively different classes. 
Quantitative 
Qn-c = frequencies in quantitatively differ- 
ent classes of a continuum. 
Оп-4 = frequencies in quantitatively differ- 
ent classes of a discontinuum. 


Оп-о = frequencies in the ordered classes 
of a continuum. 


PROBLEMS 


s As a vocabulary review the student may test 
his knowledge of the following words and phrases: 


basal year character 
categorical series characteristic 


center of population collection of data 
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continuous 
continuum 

counter sample 
contrast sample 
cycle 

discrete 

fair sampling 
frequencies in a class 
geographical series 
growth series 

index 

natural limits 
periodicity 

price index 

price ratio 

price relative 
probability 
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pseudo-time series 
qualitative series 
quantitative series 
ratio 

sampling 

series 

significance 
spatial series 
statistical series 
tabulation of data 
temporal) series 
time 

terminal statistics 
trait 

trend 

zero point 


Think of series of the sorts T-t, T-c, etc. 
of preceding Section 4. In one or two sentences 
describe each. The effort should be made to think 
of simple situations. When giving an illustra- 
tion of one sort attempt to choose data which are 
so simple that no other series type is important 
in connection with it. 
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SECTION 1. CHARACTERISTICS OF А GOOD TABLE 
IN GENERAL 


The chapter which follows deals with graphic 
methods and is concerned with charts, diagrams, 
graphs, etc., constituting pictorial represent- 
ations of statistical series. The statistical 
table is quite different. Its purpose is not 
directly to give a picture of a sequence, but to 
provide the basic data from which such a picture, 
or at least the outstanding features of such a 
Picture, may be determined and visualized if 
desired. The statistical table is simply a short- 
hand statement of facts. If a thousand or so 
facts of the sort, "The population of Aaber County 
is 4000;" "The population of Anthony County is 
3200;" "The population of Avery County is 4800;" 
etc., etc., are to be presented, they can not 
only be more concisely shown by tabulation, but 
several thousand additional facts, such as "The 
population of Anthony County is 800 larger than 
that of Aaber County" are presented at the same 
time and in an agreeably compact manner. The 
desire to accomplish double, triple, or manifold 
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presentation by a single tabular arrangement is 
the desideratum which imposes conditions and 
determines appropriateness of procedure. 

The same facts in regard to population are 
shown in the following five tables, and while 
not exhausting the possibilities of presentation, 
these will suffice to show the wide option which 
exists in presenting very simple data. 


TABLE III А 


Populations and Areas 
of Counties 


TABLE III B 


Populations and Areas 
of Counties 


Popula- | Агеа 


Counties| tion in 59. Counties 
1920 Miles 

Aaber 4,000 480 Aaber 

Anthony | 3, 200 400 Anthony 

Avery 4,800 800 Avery 

Bascomb | 16, 000 700 Bascomb 

Brown 3,000 600 Brown 
TABLE III C TABLE III D TABLE III E 


Counties arranged Counties arranged Counties. arranged 
according to according to according to 
Population Population Population 
Popula- Popula- Popula- 
tion tion 
1920 1920 


Counties Counties Counties 


Brown 3,000 Bascomb | 16,000 16,000 | Bascomb 
Апїһопу | 3,200 Ауегу 4, 800 4,800 | Avery 
Aaber Ч, 000 Aaber 4, 000 4,000 | Aaber 
Avery 4,800 Anthony 3, 200 3,200 | Anthony 


Bascomb | 16,000 Brown 3, 000 3,000 | Brown 
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Аз judged by a single purpose, no two of the 
tables given are equally meritorious. If the 
table is to be used more frequently in abstracting 
information about various counties than as a 
means of comparing counties, i.e., if it is a 
reference table and not one pointing some con- 
clusion, the itemsin the stub (the first column) 
should be arranged alphabetically, as in Tables 
III A and III B in order to facilitate the finding 
of items desired. If populations are more likely 
to be studied than areas, Table III A is prefer- 
able to Table III B, as the Population column 
holds a dominant position in Table III A. 

Should it be intended that the table be not 
primarily a reference table arranged to simplify 
the extraction of items of information, but, let 
us say, to point conclusions with reference to 
populations, Tables III C, III D, or III E are 
preferable to Tables III A or III B. If counties 
of large population are the chief consideration, 
Table III D is preferable to Table III C, as the 
first row of a table ranks higher in dominance 
than successive rows. Next in importance is the 
last row. Totals or averages are, because of 
their importance, frequently placed in the first 
row, but if other items demand this position or 
if captions (headings of columns) are less readily 
interpreted when separated from the body of the 
table by a row of totals or averages, then the 
bottom row may be used. 

As a means of pointing conclusions dependent 
upon populations, Table III E is to be preferred 
to Tables III C or III D, as the population data 
hold the dominant position in Table III E. 

In general one should so draw up the table 
that the items in the stub and the captions 
constitute the argument or information with which 
the table is entered, and so that the column and 
row next to the stub and captions contain the 
most important items to be obtained from the 
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table. Rows and columns more removed from these 
dominant positions should contain less important 
data, except that the last row and last column 
may be given to data of first or second importance. 


Such Tables as III A and III B are primary or 
general-purpose tables, since they contain the 
raw data without abridgment, and may be used for 
various purposes. Such Tables as III C, ит D, 
and III E are derived fromprimary tables, such as 
III A and III B, and by emphasizing certain facts 
serve a special purpose. These two types of 
tables should be recognized. The special-purpose 
table is always published because it conveys the 
point of the study. The general-purpose table 
should always be published also, ав it provides 
the onlv means of checking the author and of 
discovering if other or further conclusions can 
be drawn. Several tables and many calculations 
may be involved between the primary and the final 
derived table. If full description of these 
intermediate steps be given it is not essential 
that these intervening tables and calculations 
be published. 


We have indicated that there are three types 
of tables, (a) the seneral-puroose table, or 
table presenting original data without intent to 
point a conclusion or emphasize some feature at 
the expense of others; (h) the intermediate 
table, or table which is derived from the general- 
purpose table in the process of putting the data 
in such form as to lead to а conclusion; and 
(c) the special-purpose table, or table derived 
from (a) and (b) so as to bring into relief some 
conclusion. Now let us briefly посе what consti- 
tute desirable characteristics of all three, and 
then of each separately. 


Principles of dominance and labeling are the 
same for all three sorts of tables. The approved 
lettering is such as can be read if the observing 
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eye is in front of the bottom of the page. If 
demands of typography require vertical lettering, 
it should always be so as to be read from the 
right-hand margin, that is, the eyeis considered 
to be at the right, and not at the left. This 
principle of location of the observing eye at 
the bottomor right applies equally to lettering, 
whether horizontal, vertical, or oblique, of 
graphs. The title should be at the top of the 
page. When placed at the bottom, as is occasion- 
ally done, it is in a position of second domi- 
nance. The title should be brief and answer the 
followiné question, which we may always assume 
is in the reader's mind, "Is the subject matter 
of this table such that I am interested in look- 
ind at the table?" If brevity of title is incom- 
patible with accuracy, there may frequently be 
employed a subtitle in smaller type, or a foot- 
note, thus preserving the brevity and large type- 
size of the main title. It is the thing that the 
eye is supposed to rest on first as the page is 
viewed. This same principle applies to titles 
of graphs. 


There are always two means of entry into the 
table. Опе is given by the items in the stub 
(the left-margin column of the table) and the 
other Бу the items inthe captions of the columns. 
Generally the means of entry involving the larger 
number of items is made the stub, so that the 
smaller number of items may be indicated in the 
captions. The column next to the stub is the 
dominant column, while the next in dominance is 
the last column or one at the extreme right of 
the table. The dominance of any column may be 
increased by enclosing it in heavy rules. The 
row next to the captions is the dominant row, 
while the next in dominance is the last row. The 
dominance of any row may be increased by heavy 
ruling or leading. 
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SECTION 2. SPECIAL CHARACTERISTICS OF THE THREE 
TYPES OF TABLES 


General-purpose table: There are certain 
special requirements of the general-purpose table. 
It is to present the original data in all its 
detail except for items considered to be imma- 
terial. For example, if John Doe, age 13.5, 
seated at desk 17 of Mary Rrown's fifth-grade 
class, secures a test score of 86, it may well 
be that the general-purpose table merely records 
the sex, the age, the grade, and the score, it 
being assumed that the name of the boy, that of 
his teacher, and the desk where seated are ir- 
relevant with reference to any of the sundry 
purposes with which readers may approach the 
table. The author may be incorrect, for some of 
his readers might be interested in, say, the 
teacher as related to pupils' scores. However, 
this may be, it is clear that it is incumbent 
upon the author to anticipate such potential 
uses of his data, and in the light of the variety 
of usage that he can anticipate, construct his 
general-purpose table. Не should seldom limit 
the presentation so as to cover the single type 
of use to which he himself has put the data. If 
he does so limit it, then the table is still of 
value in permitting a reader to verify the author's 
computations, but this is probably a much narrower 
value than could have been given to it. 

We may not say that in publishing a general- 
purpose table the author should have no purpose, 
—rather, he should have a variety of purposes, 
as many as the uses that he can anticipate or 
imagine. Не should subordinate his own special 
purpose and serve these many purposes in the 
arrangement of the table. 

Of first importance is consideration of the 
means of entry which his reader will be called 
upon to use. 
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The devisor of the éeneral-purpose table must 
think of the concepts and classifications which 
he can count upon finding in the minds of his 
readers. Не can certainly count upon their 
knowledge of the alphabet. He can count upon 
their knowledge of chronological order, and, with 
somewhat less certainty, knowledge of certain of 
the major geographical relationships. He can 
believe that certain classes such аз "live stock," 
"cereals," "physical measurements," "mental 
measurements," "maintenance costs," "instructional 
costs," etc., will be intelligible and will in- 
clude about the same items for allofhis readers. 
In general, the arrangement and classifications 
of stub and captions are to be such as call upon 
knowledge of the sort mentioned.  Alphabetic 
entry is, in particular, of wide utility. Such 
emphasis as is consequent to the arrangement of 
the table should be for the sakeof ease of entry 
to the table, and not to emphasize some item 
therein. 

А logical arrangement involving the concepts 
of suhordination and coordination is usually a 
psychological aid to entry. For example, captions 
such as these 


FARM PRODUCTS 


LIVE STOCK 


APPLES |PEACHES| PEARS | OTHER 


provide a number of co-ordinates under a single 
superordinate. The ruling and type size bring 
into relief the classification used and required 
of the reader in order properly to enter the 
table. Obviously the principle to follow is to 
make rows or columns contiguous that will gener- 
ally be used together. Another principle to be 
kept in mind, though it frequently must be vio- 
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lated, is that normal reading habits and muscular 
development of the eye favor horizontal eye move- 
ment rather than vertical. 

In the general-purpose table there should be 
emphasis on the natural arguments used by and the 
presumed interests of the reader, while in the 
special-purpose table there may well be emphasis 
on the argument used by the author and upon his 
special interest. In the general-purpose table 
rulings, leadings, and placing in dominant po- 
sitions should be in the light of the reader's 
already established modes of thought, thus nothing 
new is being taught him, while in the special- 
purpose table these same things are used with 
the view to teaching him something, or giving an 
emphasis that is new to him. 


Special-purpose table: The essential charac- 
teristics of the special-purpose table have 
already been indicated. It is designed to point 
a conclusion. It generally consists of derived 
statistics and presents a difference between two 
or more statistics. It may well be that there 
are many intermediate processes of assembly and 
computation between the items of the general- 
purpose table and the resulting special-purpose 
table. A sample of these intermediate processes, 
or a very thorough description of them may suf- 
fice. In any case, sufficient should be given 
so that the doubting reader will be able to start 
with the data of the general-purpose table and 
verify all the statistics given in the special- 
purpose table. It is obvious that the special- 
purpose tableis always published, for it consti- 
tutes the conclusion made by the investigator. 
There is equal requirement inan article claiming 
scientific value that the general-purpose table 
be published. It may sometimes suffice in semi- 
scientific publications intended for popular 
consumption to forego publicationof the original 
data, if reference is made to some source, perhaps 
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a more technical journal, where it is available. 
This procedure is a makeshift and not in general 
to be recommended. 

In the desire to place emphasis, great care 
must be exercised to prevent distortion. For 
example, a table admittedly emphasizing the size 
and importance of Myberg might be as follows: 


Cities Population 
Myberg 217% 
Гадоте 14134 
Мотсир 11655 
Пе {хоп 6758 
Frilty 2084 


The emphasis upon Myberg has been accomplished 
by recording in the dominant row. This, in it- 
self, is not to be critized, but emphasis by 
recording the population in more expansive type 
horizontally than that used for the cities with 
which Myberg is being compared is misleading, or 
if the reader discounts the type size so accu- 
rately as not to be misled, it nevertheless has 
been annoying to him. А better presentation is 
as follows: 


Cities Population 
Ladome 14,134 
Momcup 11,655 
Defton 6,758 
MYBERG 2,114 
Frilty 2,084 


Here again the author has emphasized Myberg, 
but he has now done so without sacrificing accu- 
racy. The significance of the population of 
Myberg is only sensed when the contrast sta- 
tistics are known. Thus it is equally important 
that the reader know the population of the other 
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cities. These should accordingly be given in 
such type and in such arrangement as best to en- 
able comparison with the population of Myberg. 
Whereas the item of entry to a general-purpose 
table has been left to the particular interest of 
the reader, an endeavor has been made in the 
special-purpose table to dictate this, in so far 
as the author is able to do so, by utilizing the 
available means of emphasis. Though the author 
may legitimately dictate the issue to be attended 
to in the special-purpose table, he is never 
justified in misrepresenting any quantitative 
relationships inherent in the data, or even in 
suppressing by omission or relegating to obscure 
positions data which are of most comparative im- 
portance. 

Intermediate tables: Such intermediate tables 
and computations as have been employed in reach- 
ing the conclusions shown in the special-purpose 
table, or inother terminal statistics, represent 
the development of the plot, as it were. The 
author of a short story may revise and refine 
some elementary plot a score of times before it 
finds expression in his published work. The more 
hidden, though sound, the steps in its develop- 
ment, and the cleverer it.is, the more appealing 
to the reader. Unhappily, in the scientific 
study no concealment of the steps of development 
is permissible. These should be fully described 
and, if quite involved, illustrated by examples 
which may include both tables and computations. 
The scientist is called upon to belittle his own 
genius by a full demonstration of the directness 
and simplicity of his processes. If he does 
otherwise he may be fawned upon by the Ladies’ 
Aid or the Up-and-Coming Dinner Club, but he will 
hardly aid the efforts of serious students. The 
student should remember that verifiable knowledge 
is the object of his endeavor, and the verifi- 
cation should be by others. Common experience 
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indicates that occasionally there are capable 
students whose enthusiasm or whose distaste for 
laborious presentation leads them to jump from 
finding to finding without stopping to prove 
their course, with the net outcome of detachment 
from, and lackof influence upon, fellow students 
in the same field. 

It happens not infrequently that some statistic 
readily derived from the data givenina general- 
purpose table will be used so generally as to 
warrant its inclusion in such a table. For ex- 
ample, if the population of the states of the 
United States constitutes the data of an original 
table, there surely should be recorded in this 
table the population by divisions, and finally 
the population for the United States entire. 
These are derived statistics and do not constitute 
basic items in a general-purpose table. It may 
be believed, however, that they will be used 
almost universally by the readers of such a table. 
Те is thus a courtesy to the reader to make the 
necessary additions for him, and to record these 
totals. In so far as this is done, the table has 
taken on a special-purpose function, —опе amply 
justified by the generality with which these 
totals will be desired by readers in pursuing 
their separate interests. When such derived 
statistics are of such importance as to warrant 
inclusion in what is otherwise a general-purpose 
table, they generally warrant being placed in a 
dominant position, the row next to the captions, 
or the column next to the stub, or the last row 
or last column. They may even warrant inclusion 
in a bolder type-face or with different leading 
than the more detailed or primitive data which 
are basic to the table. Table IV S illustrates 
this, and it also gives the standard of V. S. 
Census practice in dividing the United States 
into divisions and in the order in which the 
divisions and states are listed. The most common 
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derivatives which may add to the usefulness of a 
general-purpose table are totals, arithmetic 
means (or other averages), standard deviations 
(or other measures of variability), proportions, 
and percentages. The main principles involved in 
the construction of statistical tables here dis- 
cussed, as well as certain other points, were 
found in Bowley (1926), Crum and Patton (1925), 
Day (1920), and Young (1925). 


PROBLEMS 


Problem 1. The student may test his knowledge 
of the subject matter of this chapter by giving 
himself a vocabulary test involving the following 
words and phrases. 


general-purpose table order of dominance of 


special-purpose table portions of tables 
intermediate table stub 

original table caption 

derived table observing eye 
dominance terminal statistics 


Problem 2. Draw up a general-purpose table, 
presenting data as here described: In connection 
with a certain study the following items of in- 
formation for 300 American white delinquent boys 
in a Texas juvenile training school, ages 8 to 
20, were gathered. The order in which they are 
here mentioned is haphazard. 


name 
height in mm. 
length of head in mm. 
strength of grip of right hand in lbs. 
age in completed years and days of 

a partial year 
score on a constructive ability test 
speed of tapping (30 seconds with 

the preferred hand) 
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strength of vision of left eye - Snellen 
chart test 

circumference of head in mm. 

lung capacity in cu. inches 

school grade 

handedness (left or right) 

strength of vision of right eye 

strength of grip of left hand 

number of scars on cranium 

mental test score 

breadth of head in mm. 

pubertal development (pre-pubescent, 
pubescent, or post-pubescent) 

weight in pounds 


Assume that the users of the tabledo not know 
the names of the boys, so that an alphabetical 
arrangement by name of boy will not be service- 
able. The steps to follow are: 


l. Choice of title with or without subtitle. 

2. Selection of the two main lines of classi- 
fication, and allocationof one to the stub 
and the other to the captions. 

3. The classification of items withinthe stub 
in harmony with the knowledge of the reader. 

4. А similar classification of items in the 
captions. 

5. An arrangement of items of stub in con- 
formity with the principle of placing the 
most generally used items in dominant po- 
sitions, and in keeping related items to- 
gether. 

6. A similar arrangement of the items of the 
captions. 

7. The use of special rulings, leadings, type 
size, bold face, italics, and capitali- 
zations to facilitate entryas it will pre- 
sumably be made by the reader. 

8. The inclusion or non-inclusion of certain 
derived measures in the lightof the gener- 
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ality with which it is expected they will 
be used. 

9. The actual lettering of title, stub, and 
captions in order to supplement ease of 
entering, as planned for in the preceding 
eight steps, and in harmony with the princi- 
ple of the location of the observing eye. 


Problem 3. Draw up a special-purpose table, 
using the accompanying data taken from Brigham, 
CarlC., А Study of American Intelligence (1923), 
p. 120. Your purpose is to show comparative 
intelligence test scores for certain Nordic and 
Mediterranean strains, having defined country of 
origin аз representing strains as indicated below. 
Mean army intelligence test scores of certain 
drafted men according to country of origin were 
found to be as follows: 


Nordic: Belgium 12.8, Canada 13.7, Denmark 
13.7, England 14.9, Holland 14.3, Norway 
13.0, Scotland 14.3, Sweden 13.3. 

Mixed: Austria 12.3, Germany 13.9, Ireland 
12.3, Poland 10.7, Russia 11.3 

Mediterranean: Greece 11.9, Italy 11.0, 
Turkey 12.0. 


What statistic or information in addition to 
that given in the preceding problem do you con- 
sider would be of prime value in serving the 
stated special purpose? (The answer to this 
question may appropriately be postponed until 
the next six chapters have been studied.) 


СНАРТЕВ ТУ 
GRAPHIC METHODS 
SECTION 1. GENERAL FIELDS IN WHICH SERVICEABLE 


An essential difference between a written 
description of a landscape and a painting of it 
is that the reader exercises minimum initiative 
in determining what is noted, while the viewer 
of the painting, no matter how subtly and suc- 
cessfully the high lights have been drawn, does 
take an active and personal part in determining 
the focus of attention. The wider the experience 
and training of the observer the more likely 
that the feature of most worth in the painting 
will be apprehended. For the fullest enjoyment 
not only must the artist have been an expert in 
design and execution, but the observer must be 
expert also. 

For fullest utility, the graphic portrayal of 
statistical data is dependent upon skill in 
presentation and commensurate skill in interpre- 
tation. When both of these are present the 
enjoyment and fullness of meaning gotten in 
reading a graph exceeds that in reading a de- 
scription of the same data. We can, very reason- 
ably, consider the meaning gotten from reading a 
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graph exceeds that in reading a description of 
the same data. We can, very reasonably, consider 
the meaning gotten from reading a graph as a 
product of d, m, and r, wherein d represents the 
data with its complete meaning, m, a quantity 
between 0 and 1, the excellence of the art of 
the maker of the graph, and r, also between 
0 and 1, the expertness of the reader of it, thus 
dmr<d if either m or r is short. 

If the reader of the graph is quite untutored 
the maker must resort to stars, arrows, colors, 
heavy lines, light lines, shading, verbal addenda, 
pictograms, etc., to make the story clear and 
make the reader's task as light as possible. If 
the reader is inexpert he is correspondingly 
gullible and the maker is under a heavy obli- 
gation not to resort to the little tricks of 
bias, under and over-emphasis, so available and 
so misused by the promoter. 

Various graphic devices are listed in Table 
IV A, arranged vertically according to felt ease 
of understanding, that is simplicity as the lay 
reader conceives it, and horizontally according 
to field served. 

Distorted pictures, maps, and cartoons with a 
quantitative element may be designed to relieve 
the reader of necessity for search forand correct 
appraisal of the significant feature, but they 
may also be designed to suppress or exaggerate 
phenomena and to prevent correct appraisal. The 
demarcations between the inaccurate and the 
accurate, and between the popular and the techni- 
cal, in graphic portrayal are hair lines which, 
though willfully crossed by promoters, are also 
often crossed inadvertently by amateur statis- 
ticians. 

When the nature of the data permits, the 
picturingof facts conveys a readier comprehension 
than doesa tabular array of figures. Since there 
are but two dimensions to the surface of a sheet 


TABLE IV А 
COMMON GRAPHIC DEVICES 


SERIES TO WHICH APPLICABLE 


Laden Geographical 

Quantitative Qualitative Tempora! and Spatial 
Easiest to Oistorted pictures | Pictogram Pictures showing Map with pictures at 
understand as Pictogram Circle chart changes with sundry locations. 
judged (or mis- Histogram Bar diagram time, no precise Oversimplified map. 
judged) by Frequency polygon Segmented bar labeling of 
reader diagram either axis, 

Time chart 
Of intermediate Ogive, or Very Simple Relative time Stereoscopic 
difficulty, percentile curve 2-way plot. chart. Composite | presentations. 
requiring some Block diagram. price index or Vegetation, rainfall, 
explanation if aggregate chart. barometric maps, etc. 
presented to a Growth curve, Geodetic and Coast 
popular audience Genealogical survey map. 
chart, Road map. 


Architect's working 
drawing. 


Understanding 
calls for а 
certain degree 
of technical 
training 


Smoothed curve. Contingency Geological chart. |Mercator and other 
Mathematically table Involving Semi-logarithmic projections. 
fitted curve. 2 or more vari- chart. Navigation and weather 
Al! presentations ables in ad- maps. 

involving 2 or dition to 

more variables frequencies. 

in addition to 

frequencies: 

Correlation 

table, etc. 

Alignment chart. 
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of paper, ordinarily but two variables are de- 
picted in a single graph. 
SECTION 2. TEMPORAL SERIES 


Time chart: Consider the accompanying data 
giving the maximum temperatures recorded by the 
Weather Bureau for each day in July and August, 
1917, for New York City. 


TABLE IV B 
MAXIMUM TEMPERATURE FOR EACH DAY IN 
DEGREES FAHRENHEIT 


July l-Aug. 31, 1917 
New York City 


July | 80° | auty 17 87 | ма. | 98 |Аш, 17 85 
2 88 18 80 2 96 18 80 
3 7 19 77 3 83 19 81 
Ч 78 20 83 4 80 20 84 
5 81 21 81 5 82 21 85 
6 80 22 86 6 82 22 80 
79 23 86 7 88 23 76 
8 70 24 86 8 78 24 83 
9 75 25 84 9 83 25 82 

10 65 26 85 10 80 26 74 
11 66 17 90 || 82 27 82 
[2271 28 80 12 83 28 80 
13 81 29 81 13 83 29 83 
14 81 30 95 11 78 30 81 
159675 31 98 15 81 81:75 
16 85 16 80 
iS 


If it is desired to study diurnal changes in 
maximum daily temperatures, a graph is made in 
which the abscissa represents the days in order, 
July 1, July 2, etc., and the ordinate represents 
the temperatures, 0°, 1°, 2°, etc. For July 1 
the ordinate is 80, for July 2, 88, etc. A line 
connecting the successive ordinates provides a 
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picture of the changes in maximum temperature 
throughout the two months. 


CHART IV I 
DAILY MAXIMUM TEMPERATURES 


NEW YORK CITY—JULY I-AUGUST 31, 1917 
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CHART IV II 
DAILY .MAXIMUM TEMPERATURES 
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Chart IV I is such a graph and is called a 
Time Chart. Its two dimensions are time and 
magnitude ofsomething as it changes with a change 
in time. 

Certain detailed features of charts, zero 
points, scales used, rulings and proportions: 
As the initial and final dates do not tepresent 
the beginning and end of time, but аге determined 
by the author’s whim, there should be no heavy 
vertical rules bounding the curve. A little 
space should be left between the initial plotted 
point and the left vertical rule of the chart 
and between the final point and the right rule 
of the chart, as this gives the entirely correct 
impression that the curve is dangling in time and 
that both earlier and later observations were 
possible and might have been plotted. The double 
wavy line near the bottom of the chart is drawn 
to convey the information that the temperature 
scale has been broken and that the actual zero 
point onthe temperature scale is much lower down 
than indicated by the heavy horizontal rule at 
the bottom of the chart. 

The omission of a large portion of a vertical 
scale is sometimes dictated by limitations of 
space available. For accurate portrayal it is 
desirable that a natural zero point be the zero 
point of the raw scores or measures and that 
éraphic distances be correct representations of 
distances above this point. Following this 
principle in the case of these temperature data, 
we measure temperatures from absolute zero, -460? 
Fahrenheit. The temperatures then appear: July 1, 
540? above the zero point, July 2, 548? above, 
etc. The vertical dimension would then be some 
nine times as great as shown, unless this scale 
is reduced, and if reduced to 1/9 the Chart IV I 
scale we get Chart IV II, which is rot very help- 
ful to a prospective summer visitor to New York 
or to a salesman of air-conditioning equipment. 


104 GRAPHIC METHODS 


Natural zero points for one purpose may not 
be natural for another. Fora physicist, -460° 
may be a natural zero point on the temperature 
scale. For a banana planter, perhaps 32° іѕ the 
important and, for his purposes, the natural 
point of reference. For a physician or hospital 
nurse, perhaps 98.6. is such a point. The writer 
will not attempt to define "natural zero point" 
for the definition must depend upon the particu- 
lar phenomena and Purpose in hand, but he would 
Point out that there are such points and that 
they are the points from which scores or measures 
should deviate for the most meaningful presen- 
tation. In the field of prices an obvious zero 
Point is $0.00. In situations where the top 
value is necessarily limited, as for example is 
100 per cent of a quantity, this value becomes a 
natural point of reference. There may be two 
natural points on the scale. In dealing with 
quantities that are Percentages of a total, both 
Ü per cent and 100 Per cent are such points. 
It is to be noted that in the matter of maximum 
temperatures 100° is not a limit and Charts IV I 
and IV II as drawn are not bounded at the top by 
a heavy horizontal rule. Where there is such an 
upper limit a heavy rule representing it is de- 
sirable. The ogive curve shown in Chart IV XIII 
is representative of this situation, —it being 
bounded at 0 and 100 per cent by heavy rules 
at left and right. Jf, for theartistic purposes 
of make-up it seems desirable to bound the curve 
with a rule at the top, it should be well above 
the maximum ordinate of the curve so as to give 
as little impression as Possible of being a part 
of the data. 

Ordinarily the vertical rulings of a time 
chart should be without emphasis, as one date is 
neither more nor less important than another, An 
exception can be made in the case of a relative 
time chart, as shown in Chart IV IV, where prices 
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or magnitudes are expressed relative to that at 
some basal date, the price or magnitude at this 
date customarily being called 100. In this 
instance the vertical rule at this date may be 
drawn heavier than for other dates, as shown in 
Chart IV IV. А second instance arises in con- 
nection with growth curves, for here the be- 
ginning, such as the date of birth, is a natural 
starting point and may well be indicated on the 
chart by a heavy vertical rule. 

For all the common charts, except circle dia- 
grams, the scales used are at the discretion of 
the maker. It has been claimed that the greatest 
artistic balanceis reached if the ratio of over- 
all height to over-all width is the golden mean 
provided by the extreme and mean ratio. This 
ratio, x/(l-x) = (1-х)/1, holds when (1-х)/1 = 
.618/1.00, or approximately 3/5. General experi- 
ence supports the view that a pleasiné result is 
obtained if the height of a curve is 3/5 its 
width. In Chart IV 1I, if attention is fixed 
upon the curve only and not upon the pounding 
rules, the ratio of height to width is about one 
to thirty, which is altogether undesirable. 

The lettering of a chart follows the same 
principles as those applying to a table. A chart 
may require an explanatory "legend" as shown in 
Chart IV IV. Те upper left corner is an excel- 
lent place for such a legend, but it may be 
placed in any free region. Vertical and hori- 
zontal scale rulings should be reduced to a 
minimum and given in very light lines, except 
that some slight emphasis may be given to round- 
number rulings usually represented by every fifth 
or every tenth line. 

Limitations of the time chart: Let us plot 
the several dataof Table IV C upon a single time 
chart. The result is shown in Chart IV III, 
where three different ordinate scales have been 


employed, one for wages, one for steak prices, 
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TABLE IV C 
PRICE AND WAGE DATA 
CHICAGO* 


у. 5. 


Entire | ^V. n 


Union Wage per Hour 


Year Dunn! s wa : 
Wholesale Round Painters Linotype 
Price Index Steak operatori 
$ 107.264 
113.282 
1909 111.848 55 50 
1910 123.434 60 50 
1911 115. 102 60 50 
1912 123.438 60 50 
1913 120.832 65 50 
1914 124.528 50 
124.168 


U. S. Dept. of Labor, Bur. of Labor Statistics. Union Scale 
of Wages and Hours of Labor, 1916. 


CHART IV III 


INCREASES IN WHOLESALE PRICES, RETAIL PRICES AND WAGES 
CHICAGO — 1907-1916 
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and one for wholesale prices. The advantages of 
this presentation over one showing three charts 
lies in a saving of space and in nothing else. 
In fact, there isa decided disadvantage in having 
three such separate scales upon a single chart. 
Accurate comparisons between the wholesale, 
retail steak, and wage curves are not possible 
by this chart, though the existence of the three 
curves in a single chart encourages the reader 
to attempt to make them. The test of any graphic 
presentation is that the judgment of magnitude 
consequent to the visual impressions received 
is correct. Ву this test Chart IV III is nearly 
a complete failure,—the comparison of wage of 
linotype operators with that of painters being 
the only inter-curve comparison that is sound. 
Not infrequently interest in, or, we may say, 
importance for our purposes of temporal data 
decreases as more and more remote time is con- 
sidered. This decrease may be of much the same 
order as the decrease in size when things are 
viewed in perspective. When temporal data are 
shown we have a retrospective chart. It has 
points of similarity both with the semi-loga- 
rithmic chart and the block diagram, as discussed 
by Karsten and Brooks in "Retro" Charts (1943). 
The relative time chart: The relative time 
chart is designed to enable comparisons of the 
relative changes in different magnitudes accompa- 
nying a change in time. It does permit of such 
comparison, but only with reference to some defi- 
nite data whichhas been chosen as the basal date. 
The basal year may be any date within or with- 
out the range of time presented. There are ^ ir 
common bases for the choice of one certain aate 
rather than a second, and all of these have the 
merit of choosing a moment in time when the phe- 
nomena are, in some respect, more stable than at 
neighboring moments. Temporal phenomena have 
the habit of oscillating, though usually very 
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irregularly, as time changes. Such a function, 
if continuous, has a moment of no change at each 
maximum and minimum and a moment of uniform change 
at each point of inflexion. These points have 
features of stability which are advantageous in 
a point of reference, —the point of inflexion of 
a smoothed temporal series would seem to give an 
especially excellent and stable point for a study 
of trends. In economic phenomena the height of 
inflation, the bottom of depression, and a date 
between the two when the rate of change seems 
regular are common basal dates. The fourth reason 
for a certain choice is custom, which in turn is 
commonly related to one or another of the three 
conditions mentioned, though it may be so related 
in terms of much more varied data than that given 
by a single variable. 

We will show increases of the prices and wages 
recorded in Table IV C relative to their values 
in 1907, which date becomes the basal year. We 
first compute, from the data of Table IV C the 
ratios as given in Table IV D and then plot as 
shown in Chart IV IV. 


CHART IV IV 
INCREASE IN PRICES AND WAGES — RELATIVE TO 1907 
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TABLE IV D 
PRICE AND WAGES EXPRESSED AS RATIOS* 1907 AS BASE 
CHICAGO DATA 


== 
| Dunn's |Av, Yearly |Union Wage per Hour 
Retail 
Wholesale Price 
Year Price Roünd Painters Linotype 
Index Operators 


Steak 


1908 106 10% 100 100 


ж The decimal point is omitted, аз is usual, so that a "ratio" 
of 106 means in fact 106/100 


The vertical rule for 1907 is made heavy be- 
cause this date is different from any of the 
other dates, it being the basal year. Also the 
horizontal rule for 100 is heavy because this 
also is unique, all distances above or below 
being correct measures of percentage change from 
the 1907 status. 

The relative time chart has certain inherent 
shortcomings. Changes relative to some other 
date than the basal date are frequently of inter- 
est, but are not shown, and second, a given 
distance below the 100 rule, representing аз it 
does the same proportionate change as an equal 
distance above, becomes misleading. If the 1907 
wholesale price, $107.26, doubles, it increases 
100 per cent, becoming $214.52. If this price 
halves, it decreases by but 50 per cent. The 
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price must fall to $0.00 to decrease 100 per cent. 
A change to $214.52 and a change to $0.00 are 
represented by the same distance on the relative 
time chart, but certainly the economic upheaval 
in the one case is in no sense equal, but oppo- 
site in nature, to that represented by the other. 
Advantageous as the relative idea is, it is 
nevertheless grossly inadequate if great ine- 
quality in relative change of items is present 
The relative time chart technique should be 
limited to situations wherein percentage changes 
are small, or at least of a somewhat uni form 
nature for the different variables involved. 
Where relative changes are large, a more informa- 
tive graphic presentation, though one calling 
for greater knowledge upon the part ofthe reader, 
is by means of a semi-logarithmic time chart or 
semi-logarithmic relative time chart. 
Semi-logarithmic charts: The characteristic 
of semi-logarithmic charts is that the scaling 
in one dimension, usually the ordinate, is рго- 
portional to the logarithms of a variable, while 
in the other dimension, whichis frequently time, 
the variable itself is represented. If the 
scaling in both dimensions is logarithmic the 
chart becomes a logarithmic chart. Since, for 
all positive numbers, differences in the loga- 
rithms of numbers exactly parallel proportional 
differences in the numbers themselves, logarithms 
are peculiarly appropriate in studying situations 
wherein equal proportionate changes have equal 
psychological or economic significance. Semi- 
logarithmic cross-section paper is an aid in 
making these charts, as it is then not necessary 
to look up the logarithms of the magnitudes 
plotted. This aid is quite unnecessary as the 
looking up of logarithms prior to plotting is a 
very simple task. Even this small labor may be 
avoided at the cost of some accuracy if equal 
spacings upon ordinary cross-ruled paper are 


| 
| 
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labeled with numbers whose logarithms differ by 
approximately equal amounts as given in Table 
IV E for a thirteen- anda twenty-division scale. 
Another simple way to make one's own logarithmic 
paper is to scale blank paper according to the 
divisions on the slide of a slide rule. 


TABLE IV E 


NUMBER VALUES CORRESPONDING TO EQUAL 
LOGARITHMIC DIFFERENCES 


Thirteen-division Twenty-division scale 
scale to 2* places to 2+ places to 3 places 


A benefit derivable from equally spaced rulings, 
labeled as in the thirteen- or twenty-division 
scales given is that two curves whose relative 
changes are important, but which would be widely 
separated if a single logarithmic scale is used. 
may be brought close together, using an equally- 
spaced grid and labeling differently for the two 
variables. For example, the wholesale price in 
dollars and the retail steak price in cents 
of Table IV C and Chart IV V can be shown by two 
curves in close proximity upon an equally-spaced 
grid by labeling a left-hand ordinate for whole- 
sale prices and a right-hand ordinate for retail 
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steak prices, as follows: 


25.24 
22.44 „ 
N ш 
ul <> 
© $159 20.04 2 
ос а. 

a 

ш $141 17.84 < 
- ul 
5 5 
ш $126 15.94. , 
z x 
= $112 14.1 = 
c 

$100 12.64 


The wage and price data are plotted on semi- 
logarithmic cross-section paper in Chart IV V. 
Any year may be chosen as a base and the changes 
in heights of the curves noted for any other 
year inwhich interested. The distances measuring 
these changes are correct representations of the 
magnitudes of the proportionate changes between 
the two dates. This flexibilityinchoice of the 
basal year and the accuracy with which pro- 
portional changes are shown are merits not pos- 
sessed bythe relative time chart. Adisadvantage 
in Chart IV V as drawn is that a comparison of 
changes in different curves requires a comparison 
of linear distances that are considerably removed 
in space from each other. This difficulty can 
be avoided by plotting the curves upon tracing 
paper and then shifting each curve up or down as 
may be necessary until coincidence is attained 
for some desired date. 
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CHART IV V 
RELATIVE INCREASES IN ма PRICES RETAIL PRICES ano WAGES 
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As a further illustration of the use of loga- 
rithms let us consider stocks having closing 
prices as follows: 


TABLE IV F 
CLOSING STOCK QUOTATIONS 
Previous day's This day's Change 
closing price closing price 
Stock A 12 1/2 13 * 1/2 
Stock B 162 1/2 169 +6 1/2 


Neither an examination of these figures not in- 
volving a computation with them, nor an exami- 
nation of a time chart graphically showing the 
prices will be very helpful for such issues as 
ordinarily would interest one. The fact that 
Stock B increased $6 more than Stock A is proba- 
bly not of importance. The proportionate change, 
not the absolute change, in price is the item of 
moment to economist or investor. The computation 
of the ratio of the price at one date to that at 
a second, and a comparison of such "price ratios" 
is a common practice and does serve an immediate 
need, but in general is quite inadequate in that 
the second date, or basal date, must be changed 
with each change іп issue or interest. The issues 
become quite simple and interpretations very 
direct if one but thinks in terms of logarithms 
of prices instead of the prices themselves. For 
the quotations cited the data are: 


TABLE IV G 
LOGARITHMS OF CLOSING STOCK QUOTATIONS 
Logarithm of Logarithm of Change 
previous day's this day's 
closing price closing price 
Stock A 1.0959 1. 1139 .0170 


2.2109 2.2279 - 0170 
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We immediately see that the proportionate in- 
creases are equal. Without any change in compu- 
tation as basal dates in which interested change, 
any two prices may be compared and changes for 
different time intervals may be compared. The 
difference given, .0170, is the logarithm of the 
proportionate change in price. The number whose 
logarithm is .0170 is 1.040, informing us that 
the second price is 1.040 times the first, ог 
that there is a 4 per cent increase in price in 
the case of both stocks. The relationships so 
clearly brought out by logarithms is graphically 
portrayed when amounts are plotted on coordinate 
paper with a logarithmic scale. The statisti- 
cally-minded will hope that a daily paper will 
some day publish the logarithms of market quo- 
tations themselves. For many purposes, including 
writing checks and receiving wages, it is neces- 
sary to deal with the ordinary number system and 
such things as dollars and cents, but for the 
study of relative changes in things, such as 
prices, we are better served by the logarithms of 
magnitudes than by the magnitudes themselves. 
Time Ratios: The quotient of a magnitude, 
such as a wage, a price, a measure of production, 
or a level of attainment, at one time divided by 
the similar measure at a different time is called 
a time ratio. А combination of such time ratios 
yields a general index, such as are the various 
business, cost of living, and price indexes. The 
properties of time ratios and of aggregates of 
them are fully treated by those making and using 
index numbers. It must here suffice to note that 
simple averages of such ratios generally have 
systematic errors of one sort or another due, 
among other things, to the generally skewed nature 
of distributions of ratios, as illustrated in 
Section III, Table IV N and Chart IV XII. Ў 
The Growth Curve: Ап important temporal series 


arising in biometric studies is one involving 
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growth from the beginning, or near the beginning, 
to maturity. 

The accompanying table gives smoothed scores 
in a reasoning test as given by Kelley (1917). 
Plotted they give a typical growth curve. 


TABLE IV H 
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GROWTH CURVE IN REASONING TEST ABILITY 
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AGE 


This particular curve is interesting in that 
it shows a flattening at ages 13 and 14, which 
is not at all characteristic of growth curves of 
mental traits, but as the units of measurement, 
instead of intrinsic ability, could conceivably 
account for the phenomena the curve does not 
prove, but merely suggests, that there is a 
pubertal disturbance. For the purpose of the 
present statistical treatment no attention need 
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be paid to the double inflection of the curve. 

Rotating the curve through 90° and looking at 
it in a mirror (as pictured in Chart IV VII) 
shows its general resemblance to an ogive curve. 
It was possible in the case of daily temperatures 
to cumulate scores and obtain ogive curve data. 
By the reverse process it is possible from the 
ogive data to obtain the original growth curve. 
The growth curve may be plotted as herewith: 


CHART IV VII 
GROWTH CURVE REASONING ABILITY 
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Thinking of the abscissas аз sums of incre- 
ments of reasoning ability and recalling that 
the graph is for an average individual, whose 
maximum development or accumulation is to 94 of 
such increments (i.e., the total population of 
increments is 94) the graph may be read: At age 
7 the individual possesses 11 increments of 
reasoning ability; at age 10, 50 increments, etc. 
This may be an awkward way ofinterpreting growth, 
but if it is desired to think of growth as a sum 
of increments it immediately suggests the de- 
termination of the increments added during each 
year of life as follows: 
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TABLE IV I 


Yearly Growth К 
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These growth increments plotted in the form 
of an ordinary frequency polygon give the graph 
shown in Chart IV VIII. 


CHART IV VIII 
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The bi-modality of the growth increment curve 
is of course a consequence of the double in- 
flection of the growth curve. Since the constants 
of this increment curve (mean, skewness, standard 
deviation, etc.) can be readily calculated, the 
curve has certain advantages over the growth 
curve. It should be a very convenient form in 
which to present data for purposes of studying 
variability in rate of growth, variability in 
price changes, etc. If the modifiability of a 
function (by education, by promotional sales 
pressure, by amount of food consumed, etc.) is 
roughly proportional to the normal rate of change 
of the function, the growth increment curve is а 
helpful graphic device. 

In dealing with functions in which there is a 
loss in a given period, е.в., when an individual 
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weighs less in one year than in the preceding, 
negative frequencies arise. These need cause no 
computational trouble if treated strictly alge- 
braically and the negative sign preserved. 

Brown and Thomson (1921) have shown that the 
standard deviations of the class frequencies of 
such a curve аге not given by УМра, the ordinary 
formula for the standard deviation of the fre- 
quency in a class. (See Chapter IX). 


SECTION 3. QUANTITATIVE SERIES 


When the things observed are such that the 
measurements taken vary in a negligible manner 
with change in time when, or place where, taken, 
we get the usual quantitative or qualitative 
series. Observations of the height or eye color 
of children fulfill these conditions if the time 
interval is so small that growth is negligible. 
Whether measured on the first of the month, the 
second, or the fourth, makes no material differ- 
ence. Again, suppose we give a 30-second speed- 
of-tapping test. In this function, there is not 
only considerable variation from day to day but 
from minute to minute, so that the scores on the 
first, the second, and the fourth may be. quite 
different. Nevertheless we should, for most 
purposes, treat such a set of measures as a 
quantitative series, not because they do not 
change in time, but because they change in a 
chance manner only with a change in time. Ex- 
pressed in another way wecan say that the inter- 
est we take in these measures is unrelated to 
changes in time. It not infrequently happens 
that the same data, depending upon our interest 
which determines the aspect of it that we study, 
is now of one type of series and now of another. 
When time and place are insignificant per se, ог 
for our purposes are such, they are eliminated 
as axes or dimensions in the graphic portrayal 
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of the data. There remain, as dimensions for the 
chart, the value of the trait or character meas- 
ured and the frequency with which the distin- 
guishable values occur. 

The histogram and the frequency polygon: For 
illustration let us be concerned not with the 
specific dates connected with the maximum daily 
temperatures given in Table IV B, but only with 
the frequencies of occurrence of the different 
temperatures. We now have a quantitative series. 
The data are made into a frequency table, as 
given in the first column of Table IV J, prior to 
plotting as a histogram, Chart IV IX, or fre- 
quency polygon, Chart IV X. 


CHART IV IX 


DAILY MAXIMUM TEMPERATURES 


NEW YORK CITY—JULY-AUGUST 1917 
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CHART IV X 


DAILY MAXIMUM TEMPERATURES 


NEW YORK CITY —JU LY-AUGUST 1917 
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Either the histogram or the frequency polygon 
are satisfactory devices forpresenting quantita- 
tive series. The area under the histogram accu- 
rately represents the number of observations, 62 
in this case. А little simple geometry shows 
that this is also the case with the frequency 
polygon. А somewhat greater idea of continuity 
is given by the frequency polygon than by the 
histogram. When drawing the histogram, it is 
convenient so to record the abscissa scale that 
the rules of the paper used correspond to bounda- 
ries of class intervals. For example, 64.5 and 
65.5 are the lower and upper boundaries of the 
interval within which lie temperatures recorded 
аз 65°, which value is the mid point of the 
interval and is called its class index. The 
class limits are 64.5 and 65.5, and the class 


-— 
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interval, designated by the letter i, is 1°. 

The reader will note that "interval" or "class 
interval" is used in a double sense. Its most 
usual meaning is the distance covered by a class 
(usually the same for all classes), that is, it 
equals the upper class limit minus the lower 
class limit. For the data in the 5° column of 
Table IV J, the interval as thus defined is Se 
throughout. "Class interval” may also define 
some specific class by stating the upper and 
lower limits. Thus, the second class interval in 
the 5? column of Table IV J includes values be- 
tween 61.5? and 72.5°. 


TABLE IV J 


FREQUENCY DISTRIBUTIONS OF DAILY MAXIMUM TEMPERATURES 
NEW YORK CITY, JULY - AUGUST, 1917 
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TABLE IV J «cowriNUED) 
FREQUENCY DISTRIBUTIONS OF DAILY MAXIMUM TEMPERATURES 
NEW YORK CITY, JULY - AUGUST, 1917 
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TABLE IV J tconTINUED? 


FREQUENCY DISTRIBUTIONS OF DAILY MAXIMUM TEMPERATURES 
НЕМ YORK CITY, JULY - AUGUST, 1917 


Frequencies for Various Groupings 


SEERDE ПЕ 


ed score: It 
mewhat customary in educational 
fields to speak of a child as solving 10 problems 


The exact meaning of a record 
has become so 


in a speed test, meaning thereby that 10 problems 
were solved and the llth started but not fin- 
ished when time was called. In plotting the 
distribution of scores the designating number, 
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10, has been placedat the beginningofthe inter- 
val. No objection should be made to this were 
the numerical computations in harmony with this 
procedure, but very generally such scores have 
been treated as exactly 10.0 in calculating arith- 
metical averages, with the result that the curve 
and the constants computed from the data do not 
agree. Not uncommonly such scores have been 
treated as 10.0 scores in calculating means and 
as 10.5 scores in calculating medians, with the 
result that a comparison ofmean and median scores 
gives an entirely erroneous impression as to the 
skewness of the data. This faulty procedure has 
probably been followed unwittingly, but unfortu- 
nately with the sanction of teachers., The follow- 
ing is quoted from page 50 of the Second Year 
Book, Division of Educational Research, Los 
Angeles, July, 1919: 


"LESSON SIX—THE ARITHMETIC MEAN 
Method of Finding the Mean 


No. Problems No. Pupils 
12 3 3 X 12 = 36 
LET 5 5X Il = 55 
10 7 7X 10 = 70 
9 4 4X 9 = 36 
8 2 2:558 = 16 
21 213 


213 divided by 21 equals 10.14, the mean. The 
median in this distribution would be 10.64." 


In this lesson problem the mean is in error 
if 12 stands for measures in the interval 12.0 
to 13.0 and the median (see Formula [4:01]) is in 
error if it implies the interval 11.5 to 12.5. 
The error here cited probably grew out of ап error 
in labeling a distribution. Uniformity is needed, 
and it would be in harmony with well-nigh uni- 


| 
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versal procedure in the physical and biological 
fields to consider a score of 10 as being also a 
class index, or mid-point of an interval. Should 
this lower the grade of a few million school 
children by one-half a point no harm would be 
done, for all relative positions are maintained. 

Certain students have objected to this proce- 
dure when the score is zero. Situations exist 
wherein a negative score is impossible so that a 
zero class (interval according tothe rule -.5 to 
.5) would actually have all of its frequency in 
the region .0 to .5. Though the writer has never 
run across an experimental situation in which the 
matter was important, shouldit be so a different 
procedure, fully explained, as to class indexes 
and class limits, should be followed. The writer 
believes that the use of unequal intervals carries 
with it such annoyance in harmonizing computa- 
tional procedures leading to the mean with those 
leading to percentiles, andof both of these with 
graphic procedures as to warrant the general 
practive of treating every score (including 0) as 
a class index. 

Throughout this text a score, no matter how 
derived oriéinally, is uniformly to be inter- 
preted аз. а class index or mid-point of an inter- 
val extendiné from half an interval below to half 
an interval above. 

A situation of great importance involving the 
meaning of a score arises in connection with 
chronological age. А practice which is not uni- 
versal, but so common as to demand recognition, 
Visto speak of individuals of various age groups, 
8-year-olds, 9-year-olds, upon the basis of the 
age at last birthday, and not of age at nearest 
birthday. Thus a child 8 years, 11 months, and 
29 days will be called an 8-year-old. dee this 
practice were universal, we could unequi vocally 
call the class index of 8-year-olds 8.5, of 9- 
year-olds 9.5, etc., and we should then regard 
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a child's age, if it is merely stated that he is 
an 8-year-old, as 8.5. The lack of universality 
of a single practice in this important matter 
necessitates that each author specify in detail 
the exact meaning of the classes that he gives in 
any age classification. We shall here follow the 
practice of considering "8-year-olds" to con- 
sist of all those lying between the limits 8.09 
and 9.00, which provides a class with mid-point 
8.50. 

When drawing a frequency polygon it is con- 
venient so to place the abscissa scale that class 
indexes coincide with rulings of the paper used 
The number of days shown in Table IV J as having 
a maximum temperature of 64° was zero. Ac- 
cordingly the point (64, 0) isan essential point 
on the curve. The curve should be definitely 
terminated at this point and not at such a point 
as (64.5, 0). For both the histogram and the 
frequency polygon it is desirable that there be 
at least one interval between the bounding rules 
of the chart and the initial and final values 
plotted. Frequently it is desirable to shade the 
histogram so that the bounds of the curve will 
not be confused with the rulings of the paper 
used. Such shading is hardly necessary in the 
case of the frequency polygon. 

The curves as drawn show ll modes, or temper- 
ature values for which the frequency is greater 
than for immediately smaller or [агбег values, as 
follows: 65.5, 1025, 15, (8, 80, 83, 85, 88, 90, 
95.5, 98. If these 62 days are taken аз a sample 
of July-August days in general, and such would 
typically be the case as one would be interested 
in these data аз evidence of future phenomena, it 
is not to be expected that all of these modes are 
trustworthy, which would mean that one would 
expect the same modes to occur in 1918, 1919, etc. 
These sundry modes are features of the sample and 
are due to chance, or unknown causes, and are not 
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to be taken as descriptive of the population, 
the general distribution, or average distribution 
of maximum daily temperatures if for many years 
July-August records are combined. Probably the 
major mode, frequently called "the mode, "— that 
in the neighborhood of 81?, is a feature that 
exists in the population andis found here, some- 
what distorted, in the sample. Let us now group 
the data of the first column of Table IV J into 
larger and larger classes, resulting in the 
several columns of Table IV J, and note what 
happens to the sundry modes. 


Grouping and labeling classes: The frequen- 
cies in the various columns of Table IV J are 
written opposite their respective class indexes. 
Note that when the interval is odd, 1, 3, 5, 
etc., the class index is in each instance di- 
visible by the interval, and the class limits 
are fractional, being one-half the size of the 
interval below and above the class index. Thus, 
for the interval of 3, the first class has a 
class index of 66 (which is divisible by 3), а 
lower limit of 64.5 and an upper limit of 67.5. 
It is desirable that the class limits be fraction- 
al for then, if the original data are recorded to 
the nearest units, no value falls exactly upon a 
class limit and there is never ambiguity as to 
which class to assign a value. Classes may be 
unambiguously labeled in any one of the three 
ways, as illustrated in Table ТУК, for an interval 
0173: 

The first method is by recording class indexes 
and is frequently the simplest and neatest in 
appearance. The second method is by recording 
class limits and should be employed if the size 
of the class interval is not constant throughout 
the entire range of the data. The third method 
is by recordiné the inclusive integral values 
and is very satisfactory when making out tally 
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TABLE IV K 
THE LABELING OF CLASSES 
(i = 3) 

x х х 
First class 56 64.5 - 67.5 65 - 67 
Second class 69 67.5 - 70.5 68 - 70 
Third class 72 70.5 - 73.5 7| - 73 

etc. etc. etc. etc. 


sheets fromoriginal data. If labelingis by this 
third method a value such as 70 is immediately 
seen to lie inthe second class, while if labeling 
is by class indexes one must first note that the 
value 70 lies closer to 69 than to any other class 
index before it can be assigned to its proper 
class. 

If the size of the interval is even it is 
impossible to have an integral class index and a 
fractional class limit at one and the same time. 
Of these two the fractional class limit is the 
more important. Since when the interval is even 
the fractional class index cannot be exactly 
divisible by the interval we shall always so 
select the class limits as to make the lowest 
integral value ina class divisible by the inter- 
val. Thus, if i - 4, the first class is to be 
consistent with the following labeling, 65.5, or 
63.5-67.5, or 64-67 (the 64 being divisible by 4). 

This procedure is to be followed rigorously, 
including the common and important case when 
i = 10. Неге the classes run as follows: 60-69, 
70-79, etc., andthe class indexes are 64.5, 74.5, 
etc., —not 65, 75, etc.,—and the class limits 
are 59.5, 69.5, etc.,—not 60, 70, etc. If the 
original data aredollar and cent magnitudes, but 
not amounts in fractional parts of a cent, then 
the classes could be: 60.00-69.99, 70.00-79.99, 
etc., having class indexes 64.995, 74.995, etc., 
and class limits 59.995, 69.995, etc. It is to 
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be noted that eveninthis case 65, 75, etc., are 
not the class indexes. 

In spite of all efforts to employ classes so 
that boundary values will not be found in the 
original data, it will sometimes happen that this 
cannot be done, so we must adopt a rule for the 
allocation of boundary values. We do not wish 
always to put the boundary value in the class 
above, or always in the class below, for either 
of these procedures will introduce a systematic 
error. То avoid this, we adopt the rule of 
assigning a boundary value to the neighboring 
value that is even. For example, the value 66.5 
will be called 66, and not 67, and thus it is 
thrown in the lower class. The value 73.5 is to 
be called 74, and not 73, andis thus thrown into 
the higher class. This rule is commonly followed 
hy astronomers and workers in the physical 
sciences. It is a serviceable one for general 
adoption, though it does not meet all difficul- 
ties. If boundaries of classification can be so 
arranged that the rule is not called into play, 
they should be so arranged. 


Effect of grouping upon modes: Following 
this diversion upon grouping and labeling we 
will return to the question of the effect of 
grouping upon modes. Ап examination of the 
frequehcies in the column for which i 72 of Table 
IV J shows that there are five modes at temper- 
atures as follows: 65.5, 70.5, 74.5, 80.5, 98.5, 
With a little coarser grouping, 2 = 3, there are 
three modes, 66, 81, 97.5. With the grouping 
i = 4 there are two modes, 81.5 and 97.5. With 
the grouping i = 5 there is but one mode, 80. 
With the next coarser grouping, i = 6, it would 
be found that two modes appear. However, we may 
say that in general the coarser the grouping the 
greater the tendency to eliminate minor modes. 
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The coarseness of grouping systematically affects 
the height of the curve at the mode, as shown in 
Table IV L herewith. 


TABLE IV L 


MODAL FREQUENCY WITH VARIOUS GROUPINGS 


Size of 
Interval 


Frequency 
at Mode 


For the purposes of graphic portrayal we should 
follow the principle of grouping coarsely enough 
to suppress such secondary modes as are probably 
due to chance, but not such as are probably evi- 
dence of secondary modes existing in the popu- 
lation. In short, we should aim so to group as 
to suppress the idiosyncrasies of the sample 
while preserving the verities of the population. 
The precise answer to the question "how coarse 
to group to accomplish this result?" depends 
upon the nature of the distribution as well as 
the size of the sample. Since many distributions 
are quite similar to the normal distribution in 
the neighborhood of their major modes, an answer 
strictly applicable to samples drawn from normal 
populations may be quite serviceable for many 
other situations as well, and particularly so if 
concern is as to the location of the major mode. 

Since a graphic portrayal that is misleading 
even as to the location of the major mode will 
be more so as to the existence and location of 
minor modes, we surely should adopt a sufficiently 
coarse grouping that there is probably not a mis- 
judgment as to the interval in which the popu- 
lation major mode lies. 

The detailed steps involved in the solution 
of this problem and leading to Table IV M are 
given in Chapter XIII, Sections 10 and 11. The 
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reader is advised to increase, generally very 
slightly, the numberof classes he employs if his 
data are leptokurtic, i.e., show frequencies in 
excess of those normally expected in the extreme 
tail regions and at the mode. 


TABLE IV M 


THE NUMBER OF CLASSES TO USE FOR GRAPHIC 
PORTRAYAL OF SAMPLES OF SIZE № DRAWN 
FROM А NORMAL POPULATION 


N No. of Classes 
4-5 2 
6-8 3 
9-14 4 
15-21 5 
22-32 6 
33-46 7 
47-64 8 
65-89 9 
90-117 10 
118- 153 M 
154-192 12 
193-255 13 
256-315 n 


Applying the information given in Table IV M, 
we see that we should employ 8* classes, to pre- 
sent the distribution of the 62 maximum daily 
temperatures. The range of temperatures is 
(98-6541) or 34°. Dividing 34 by 8 gives 4.25, 
which to the nearest integer is 4, as the pre- 
ferred size of interval. A plot with this inter- 
val is shown in Chart IV XI. The reader is asked 
to fix firmly in mind that a grouping as coarse 
as this is recommended for éraphic portrayal 
only, and that a finer grouping, i-12, as explained 
in Chapter VII, is desirable for statistical 
computations leading to measures of central ten- 
dency, of variability, and of correlation. 
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CHART IV XI 


DAILY MAXIMUM TEMPERATURES 
NEW VORK CITY—JULY-AUGUST 1917 


NO. OF DAYS 
З я 


TEMPERATURE 


Chart IV XI shows a decided major mode in the 
neighborhood of 82? and a minor mode in the 
neighborhood of 98°. The grouping was such that 
the existence of a major mode in the neighborhood 
of 82° can be trusted, but the class frequencies 
in the neighborhood of 98? are so small that the 
existence of a mode in this region is to be 
doubted. In other words, the grouping employed 
was designed to give a certain confidence with 
reference to the location to within about a half 
an interval of a major mode only. The more pre- 
cise method for locating this mode is given in 
Chapter VII, Section 3. А considerably coarser 
grouping would Бе necessary before a much smaller 
minor mode found in the sample can be assumed to 
be a feature of the population. 

Frequency polygon of a skewed distribution: 
As an illustration of the tendency for a distri- 
bution of ratios to be skewed, and as an im- 
portant problem in plotting, we will study the 
data of Table IV №, drawn from Mitchell, as quoted 
by Secrist (1917). 
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| TABLE ТУ N 
[ DISTRIBUTION OF 5578 CASES OF CHANGE IN THE WHOLESALE 
PRICES OF COMMODITIES FROM ONE YEAR TO THE NEXT 


7 PER CENT OF CHANGE PER CENT OF CHANCE 
! FROM THE AVERAGE NUMBER FROM THE AVERAGE NUMBER 
PRICE OF T HE OF PRICE OF THE OF 
| PRECEDING YEAR CASES PRECEDING YEAR CASES 
| (FALLING PRICES) (RISING PRICES) 
н 54-55. 9 І 4- 5.9 356 
B 1-1. 
| 08-0979 ! 10-11.9 167 
16-47. 9 | 12-13.9 115 
44-5. 9 2 14-15.9 108 
| 42-43. 9 M 16-17.9 102 
| 30-01. 9 5 18-19.9 73 
Н 38-39. 9 5 20-21.9 65 
) 36-37.9 7 22-23.9 45 
34-35. 9 10 ERE i 
32-33. 9 7 26-27.9 29 
30-31.9 16 23223 50 
28-29.9 27 30-31.9 22 
бы 17 32-33.9 17 
ИРЕ ра 84-35.9 18 
22-23.9 39 ав Jur 
MES M 28 88-39, 9 17 
| 18-19.9 7! 40-41.9 14 
| 16-17.9 76 54:9 6 
| 14-15.9 107 цу 45. 9 10 
\ 12-13. 9 120 46-47. 9 и 
10-11.9 173 И 5 
| 8- 9.9 200 БЫЗ i 
6- 7.9 238 59:53.9 ji 
4- 5.9 329 о 3 
2- 3.9 375 56-57.9 ! 
UNDER 2 805 SEA 6 
NO CHANGE CK 3 
(RISING PRICES) er сея 
UNDER 2 
2- 3.9 | 355 
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TABLE IV N  tcowriNUED) 


DISTRIBUTION OF 5578 CASES OF CHANGE IN THE WHOLESALE 
PRICES OF COMMODITIES FROM ONE YEAR TO THE NEXT 


PER CENT OF CHANGE 
FROM THE AVERAGE 
PRICE OF THE 
PRECEDING YEAR 

(RISING PRICES) 


PER CENT OF CHANGE 
FROM THE AVERAGE 
PRICE OF THE 
PRECEDING YEAR 
(RISING PRICES) 


NUMBER NUMBER 
or ОЕ 


66-67.9 
68-69.9 
70-71.9 
72-73.9 
74-75.9 


80-81.9 М 2 5,578 __ 


100-101.9 
102-103.9 


It will be noticed that the class intervals 
extend over ranges of two units, e.g., there are 
five class intervals in covering a rise in prices 
from 10 per cent to (but mot including) 20 per 
cent. With no direction to the contrary it is to 
be presumed that the class designated in the 
table by "54 - 55.9" includes all measures with 
values between the limits 53.95 and 55.95; that 
the next class includes measures between 51.95 
and 53.95; etc. This is to say that presumably 
the data have been recorded to but one decimal 
place so that such measures as 53.86 and 53.92 
are called 53.9 and a measure such as 53.96 is 
recorded as 54.0. If the recorder encountered a 
measure 53.95, he had to decide arbitrarily whether 
it would be called 53.9 or 54.0. For the data 
in hand it is not definitely known how such a 
case would have been decided, but we shall assume 
that the preferred and customary procedure was 
followed. 

If the class intervals run inorder from 53.95 
to 55.95, 51:95 То 53.95, anal Oo 503.95 4t: 
is found that the next frequency, in order to 
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extend over the same range, would be from -.05 
to 1.95, i.e., from an increase in price of .05 
per cent to a decrease of 1.95 per cent. This, 
however, cannot be the case, as a very large 
frequency, 697, is recorded for "no change." The 
way the data are recorded would suggest a class 
interval corresponding to "no change," but this 
cannot be so as the intervals on either side 
preempt the space. The "no change" cases pre- 
sumably fall at a point andnot at variable points 
within an interval. In plotting the data, there- 
fore, the "no change" interval must be squeezed 
out and its frequency, 697, distributed between 
the neighboring classes. We will assign 348 to 
the "under 2—Falling prices" interval, and the 
remainder, 349, to the "under 2—Rising prices" 
interval. There still is a slight discrepancy 
(.05) inthe rangesof these two middle intervals, 
but as it cannot be positively accounted for 
without recourse to the original data it is 
passed over. 


For convenience in tabulation and plotting we 
will consider the first class interval to extend 
from 54.00 to 56.00 and to have its mid-point or 
class index 55.00, the second a mid-point at 
53.00, etc., and the frequencies as before. 


The variable in Table IV N has a range of 
values from approximately -55.95 to 103.95 or 
159.90. The number of cases is 5,578. Reference 
to Chapter XIII, Table XIII H, suggests that we 
should have about 33 plotted points, which corre- 
sponds to an interval of 4,8. However, the data 
are very leptokurtic, so that an interval of 
this size will be coarser than would be desirable 
in the neighborhood of the mode and finer than 
desirable in the tail regions. As we are limited 
in our options to 2, 4, 6, etc., we shall choose 
an interval of 4, tabulating the data as shown 
in Table IV О. 
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TABLE IV О 


PER CENT CLASS INTERVAI| PER CENT CLASS INTERVAL 
OF CHANGE OF OF 
4 PER CENT 4 PER CENT 
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TABLE IV O (continued? 


CLASS INTERVA 
OF 
4 PER CENT 


PER CENT 
OF CHANGE 
FROM THE 


PER CENT 
OF CHANGE 
FROM THE 


CLASS INTERVAL 
OF 


4 PER CENT 


AVERAGE || AVERAGE 
PRICE OF RÍ PRICE OF 
THE PRE- | THE PRE- 
CEDING 9] CEDING 


УЕАВ 


the impression 
would a histogram and in this case, judging by 
the number of zero change items, this feature 
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CHART IV XII 


CHANGE IN 5578 WHOLESALE PRICES 
FROM ONE YEAR TO NEXT 


1400 | | 
1200 6 — 


-60 -40 20 0 0 4 60 80 100 120 
PER CENT OF PRICE CHANGE 


Five ways of drawing a granh: There are 
three common ways of making a graph: (а) by 
erecting rectangles upon the appropriate base 
intervals; (b) by connecting plotted points by 
means of straight lines; and (c) by drawing a 
smooth curve through ог near all the points which 
fits the data as nearly as can be determined 
visually. For quantitative data (a) and (5) 
result in the histogram and frequency polygon 
respectively. А fourth though less common way 
(d) is to plot from smoothed data; and a fifth 
(e) is the advanced method of mathematically 
determining the equation of the curve which best 
fits the data and plotting the same. Methods 
(a), (b), and (e), and usually (а), preserve 
areas, i.e., the total area under the curve is 
equal to the number of cases in the sample. 
Method (e) also preserves other important features. 


QUANTITATIVE SERIES 141 


In using method (c) there should be a definite 
attempt to preserve areas; that is, if the curve 
as drawn lies above any point it should lie below 
some other, or, more accurately, the sum of the 
vertical distances which it lies above points in 
the actual distribution should equal the sum of 
the distances which it lies below other points. 
In drawing a free curve for such a distribution 
as that. of incomes in the U.S., the preservation 
of the total area is difficult to insure, but for 
maximum temperatures, Table IV J and Chart IV X, 
it can be accomplished with fair accuracy and 
little trouble. The personal element which enters 
into method (c) generally makes it inadvisable 
for published work; but for original, hasty and 
personal researchit may well be used frequently. 


The cumulative frequency curve, or ogive, or 
percentile graph: When the frequencies in the 
successive classes of an ordinary unimodal fre- 
quency distribution are cumulated, interval by 
interval, beginning at either end, there result 
frequencies having values of the variate less 
than, or greater than, the successive limits of 
the intervals of the frequency distribution. The 
curve plotted from these is variously called a 
"cumulative frequency graph," an "ogive," or, in 
the important case where percentage frequencies 
are used, a "percentile graph," or "percentile 
curve." 

The temperature data, as given in the first 
two columns of Table IV J, may be used to illus- 
trate such a graph. From the column of Table 
IV J giving the finest grouping available, we 
immediately obtain Table IV P. 

The student should carefully note the exact 
boundary values which apply to the values given 
in column 3. Table IV J states that the maximum 
temperature for the coldest day was 65 . As the 
grouping interval is 1° this asserts that for 
this day the maximum temperature was greater 
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TABLE IV P 


CUMULATIVE FREQUENCIES OF DAILY MAXIMUM TEMPERATURES 
New York City, July-August, 1917 


NO. OF DAYS PROPORTION OF 
HAVING LOWER DAYS HAVING 


TEMPERATURES MAXIMUM LOWER MAXIMUM 
TEMPERATURES TEMPERATURES 


63.5 0 „000 
64.5 0 .000 
65.5 1 .016 
66.5 2 .032 
61.5 2 „032 
68.5 2 „032 
69.5 2 „032 
10.5 3 .048 
71.5 4 .065 
12.5 4 .065 
73.5 4 .065 
14.5 6 -091 
15-5 9 .145 
16.5 10 +161 
11.5 11 „177 
78.5 14 .226 
19.5 15 .242 
80.5 25 .403 
81.5 33 -532 
82.5 38 . 613 
83.5 +5 „126 
84.5 47 +7158 
85.5 51 „823 
86.5 5% -811 
81.5 55 .881 
88.5 51 .919 
89.5 51 .919 
90.5 58 .935 
91.5 58 4935 
32.5 58 „935 
93.5 58 -935 
94.5 58 „935 
95.5 759 „952 
96.5 60 .968 
91.5 60 .968 
98.5 62 1.000 


ас ста С анис © а тә а _ 
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than 64.5" and less than 65.5°. We therefore can 
assert, as is done in column two of Table IV P, 
that there wasone day with a maximum temperature 
less than 65.5°. Of course it is incorrect to 
assert that one day had a maximum temperature 
less than 65°. Unfortunately this type of error 
is common and the student is advised to compute 
points on published percentile charts, if the 
issue is important andifthe basic data are given 
so that such computation is possible. 

It is also to be noted that the information 
given by the fourth row, that "2 days had a maxi- 
mum temperature less than 66.5," is just as true 
and neither more nor less true than that given by 
the fifth, sixth, or seventh rows, that "2 days 
had a maximum temperature less than 67.5, 68.5, 
and 69.5," respectively. Since these are all 
equally sound we can secure a certain desirable 
smoothing and a simplicityof statement if, prior 
to plotting, we replace these four statements by 
the single statement, "2 days had a maximum 
temperature less than 68.0," the 68.0 being the 
average of 66.5 and 69.5. This is recommended 
procedure wherever multiple statements occur, 
except in the case of statements referring to 
terminal situations. 

At the lower end wehave "zero days hada maxi- 
mum temperature less than 64.5." Also "less than 
63.5." Also "less than 62.5." Etc., without 
limit. Since the series 64.5, 63.5, 62.5, etc., 
continues without limit, the average of the 
smallest and the largestisindefinite. Therefore 
we will not plotitand we will not assume that we 
have any trustworthy information from the sample, 
of what is the temperature such that in the popu- 
lation "zero days willhave a lessmaximum temper- 
ature." In other words, the zero percentile (also 
the 100) is completely indeterminate from any in- 
formation given by the sample. We accordingly 
will not plot the zero percentile (nor the 100 
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percentile). In fact, the lowest percentile to be 
plotted is that given by the proportion of cases in 
the smallest interval having oneormore cases in 
it. This proportion, inthis problem, is 1/62, or 
4016, so that the lowest computable percentage is 
the 1.6 percentile. The highest computable per- 
centile is established by one minus the proportion 
of cases in the highest interval having one or 
more cases init. In this problem this proportion 
is 1-2/62, or .968), so that 96.8 is the highest 
computable percentile. 

Another way of expressing these facts is to 
say that from this sample of 62 days percentiles 
less than l.6 and greater than 96.8 have infinite 
standard errors, while percentiles between these 
two values have finite standard errors, thoughas 
will be shown later the presumptive errors in the 
computed percentiles near 1.6 and 96.8 will be 
very large. 

If we smooth as suggested we obtain the fol- 
lowing table: 


TABLE IV Q 


CUMULATIVE FREQUENCIES OF DAILY MAXIMUM TEMPERATURES 
NEW YORK CITY, JULY-AUGUST, 1917 

Proportion of 

days having 

lower maximum 

ООВ 


Proportion of 
days having 
lower maximum 
temperatures 


Temperatures Temperatures 


== 
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The plot of these data is shown by the dot 
line curve of Chart IV XIII. Special care is 
called for in the labeling of the axes. The terms 
"ogive" and "percentile" are technical terms and 
somewhat undesirable for use in the title to the 
Chart. They are quite unsatisfactory in con- 
nection with axes labels. 


CHART IV XIII 
DAILY MAXIMUM TEMPERATURES 


NEW YORK CITY —JULY-AUGUST (917 
D—— 


28118 
1 


TEMPERATURE 
oo 
© 
x 
M 
Ї 


3 
ЭЕ 


8 
П 


ПЕНН = 
о Ю 20 30 40 50 60 70 80 оо 00 
PERCENTAGE OF DAYS HAVING LOWER MAXIMUM TEMPERATURES 


The values for the chart as shown have been 
laborious to compute, as there are many of them, 
and annoying to plot, as they are not integral 
values. It is accordingly economical and not 
misleading to calculate a small number of se- 
lected percentiles and plot these values only. 
We have earlier noted that for a sample of 62 
eight classes are sufficient for graphic por- 
trayal. Some number of computed percentiles, say 
not less than eight, should suffice for a satis- 
factory ogive. Let us therefore compute the 
following percentiles: 2, 5, 10, 20, 35, 50, 65, 
80, 90, 95, 96, and plot the ogive from these. 
The values 2 and 96 have been chosen because they 
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are the smallest and the largest integral per- 
centiles calculable from the data. 


The computation of percentiles: The compu- 
tation follows formula [4:01] which is simple to 
use and is self-explanatory as soon as the meaning 
of the symbols is grasped. 


N = the number of cases in the sample. 

P, = the percentile in question (if p = .10 
then P ‚„ = the tenth percentile). 

р = the proportion of cases having smaller 
values than P. 

DN = the number of cases having smaller values 
than P . If exactly pN cases lie below 
some class limit, that limit =P. (Note, 
however, Р о; example following.) 


Determine the class wherein the PN measure 
lies and let 


У = the value of the lower limit of this 
class. 

£ = the frequency, or number of cases, in 
this class. 

i = the interval, or range, covered by this 
class. 

F, = the sumof the frequencies in all classes 


below (1.е., classes with smaller X val- 
ues than) this class. 


Then 
pN - F 
© = v PER h Value of a percentile [4:01] 


P 


This formula is convenient for finding an assi gned 
percentile. 
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A not infrequent problem is to find the per- 
centile value of an assigned score. Let this 
assigned value be Р. Then every term in [4:01] 
except p is known and this may be solved for, thus: 


Ё(Р-у) 
Е ^ po р Ыр 
р ==” сер TS of pum [4: 02] 
р 


N falling below 


Example (a) involving no complications:  com- 
pute the 10th percentile for the temperature data 
of column "i = ]?" of Table IV J and Table IV P. 
The computation could also be based upon the 
frequencies in column "i = 2°" with satisfactory 
results, for the number of classes is then still 
greater than 12, but still coarser grouping 
should be avoided. 


pri 10 
PN = 6.2 


We observe that the 6.2 measure lies in the 
class with class index 75.0 and that 


vies 74.5 
£ 3 
i, 1 
F, 6 
Accordingly 
Pip = 14-5 9006р. 74.57 


Example (b) wherein the class containing the pN 
measure is neighboring to one or more classes 
with zero frequency: Compute P ,,. 
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PS i05 

pi 3.1 

We observe that the 3.1 measure lies in the 
class in Table IV J with class index 71 and that 
the two neighboring classes above have zero 
frequency. We therefore consider the frequency, 
l, found in this class to extend over a stretch 
equal to that of this class plus half that of 
the neighboring zero classes, so that 


V, 7 70.5 and the upper class limit = 72.5 


giving 
азе д 
EL 
Вы 
Thus 
Pigs 10.5 ‚бз, = 70510 


Having the computed values for percentiles 
conveniently spaced from the smallest, Рїдэ» t0 
the largest Роб, the percentile curve may quickly 
be plotted, as shown by the full line curve of 
Chart IV XIII. Comparison with the dot line 
curve shows that this curve based upon but 11 
values, all integral, is entirely adequate. 

The issue of the reliability of percentiles 
could be appropriately studied after that of 
variability, as given in Chapter VI, but to 
impress the inescapable intimacy between a sta- 
tistic and its error the standard error of a 
percentile is introduced at this point. The 
beginning student is not expected to grasp fully 
the meaning of a standard error at this point. 
After studying Chapter VI he should re-read 
this section. 
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In calculating the standard error of a per- 
centile itis desirable to use an interval as 
large as that recommended for graphic portrayal 
and to make the center of the interval coincide 
as nearly as possible with the percentile in 
question. We therefore define two new constants, 
g and Pis 


i' = the interval recommended for graphic 
portrayal. 
#' = the frequency in the i'interval having 


a class index as nearly equal to P, as 
the original grouping (1.е., the data as 
first recorded prior to any grouping) of 
the data permits. 


We also let а=1-р 


Then, 
ЗА Кра 
с = Standard error of а 
n fi регсеп+ї!е*......... зн EOE] 


In the derivation of this, as of all other 
standard error formulas, the elements in the 
right-hand member call for population, or true, 
values. Since these are unavailable, necessity 
dictates that sample values be substituted, with 
the result that the standard error computed does 
itself have а standard error or element of un- 
certainty, but as this usually attaches to more 
remote figures than the first or second, the 
computed value remains a highly important sta- 


tistic. 


* 
Derived in Kelley, 1923, Formula [+2]. 


150 GRAPHIC METHODS - [4:03] 
SECTION 4. QUALITATIVE SERIES 


The qualitative or categorical is the most 
primitive of all statistical series. It conveys 
the minimum of information about the data, in- 
forming one simply that there are different 
classes and different numbers of cases or differ- 
ent values attached to the different classes. 
If certain temporal, geographic, or quantitative 
information inherent in series of these types is 
neglected in studying the data of these types, 
the series degenerate into qualitative series, 
for there still remain different categories and 
attached amounts or frequencies. 

There are two important sub-types to quali- 
tative data: (1) in which random sampling of a 
population has resulted in observed frequencies 
in a number of categorical classes, and (2) in 
which an observation of some specific sort taken 
upon a number of categories shows different 
values, not numbers of cases, attached to the 
Tespective classes. The second column of Table 
IV R (hypothetical) is of the first type, while 
the third column is of the second. The dis- 
tinction between these sub-types is important in 

TABLE IV R 
VOCATIONAL DISTRIBUTION IN NOTOWN OF WORKERS AND OF 
CAPITAL LOCALLY INVESTED IN CONNECTION WITH 


R N OR EMPLO 
vocations SDP 
” Wo ke 
Farming 80 


Mining 

Transportation 10,000 
Manufacturing 110,000 
Selling 40,000 
Personal Service 5, 000 


Other 
$ 400,000 
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connection with computational procedures, for the 
reliability of the frequency in classes obtained 
as a result of random sampling is known (see 
[9:02]) but the reliability of amounts is not 
ascertainable without more information than is 
commonly available. The reason for this is that 
each case, —worker employed,—is presumably an 
independent observation resulting from a process 
of enumeration, but each amount, —dollarof capi- 
tal employed—is presumably not an observation 
which is independent of every other amount. We 
may not reasonably look upon the $400,000 as a 
result of 400,000 measures chosen at random from 
an infinite population. If there exists Sometown, 
which upon acquaintance is judged to be generally 
similar to Notown and baving 200 workers, we 
may expect to find, within limits suggested by 


00X45X155 
ие (СНС iati 
/ 200Х200 (= 5.9) e.standard deviation of 


the frequency in a class, that the number of 
workers engaged in manufacturing will be 45, but 
we should not expect to find, within limits sug- 


00,000X110,000x290,000 _ 
gested by /— 7 n0, 000x400,000 (282), 


that the amount of capital employed in manu- 
facturing will be $100,000. This difference is 
of great importance in statistical deduction. 
It is even important in the reading of a graph. 
Customary devices for representing categori- 
cal data are the bar graph, Chart IV XIV, the 
segmented bar, the circle graph or pie chart 
which is a circle with sectors, Chart IV XV, and 
pictograms representing both the number of cases 
in and the nature of categories, as, e.g., would 
be done if 80 farmers are represented by, say, 
16 pictures of men with hoes, and seven miners 
by 1.4 men of the same size as the farmers but 
with miners’ caps and lamps, etc. The first three 
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devices require descriptive lables, but the picto- 
gramis serviceable for non-readers ог others more 
inclined to the comics than the editorial page. 


CHART IV XIV 
WORKERS AND CAPITAL IN NOTOWN 


wa OF WORKERS CAPITAL EMPLOYED (IN #000) 
FARMING Л 


MANUFACTURING 
SELLING 
PERSONAL SERVICE | 
TRANSPORTATION 
MINING 

OTHER 


0 60 120 180 
СНАВТ ТУ ХУ 


DISTRIBUTION OF CAPITAL IN NOTOWN 
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SECTION 5. GEOGRAPHIC SERIES 


The feature of the spatial series not shared 
by quantitative, qualitative, or temporal series 
is that the data are tied to a position in space 
which is important for the issues in question 
and must not be neglected. If neglected the 
series ceases to be spatial and becomes, gener- 
ally, a quantitative series. This is commonly 
the case when names instead of geographic po- 
sitions are given in connection with products of 
districts, e.g., if the 1941 chicken production 
for the counties of California are given in a 
table naming the counties and the number of 
chickens produced, the reader who does not have 
a map of California, showing counties, before 
himor іп his mind's eye is presented with nothing 
but a quantitative series. If this information 
is presented on a map it is a geographic series, 
which is the most common sub-type under spatial 
series. Just as the temporal series for the 
daily maximum temperatures in July and August in 
New York City ceased to be temporal when the 
temperatures were divorced from the particular 
dates with which originally connected, so a geo- 
graphic series ceases to be such when the items 
are so presented that the reader does not attach 
them to geographic position. For the existence 
of a spatial series a spatial metric,—one, two, 
or three dimensions as the case may be,— must be 
employed. 

The one- and the three-dimensional spatial 
Series are unusual but very interesting when 
relevant. А survey giving for New York City the 
amounts of dust in the air per cubic foot for 
different spatial positions, each having a north- 
south, an east-west, and a distance-above-street- 
level dimension, is an illustration of a three- 
dimensional spatial series. Its representation 
requires a three-dimensional model (or a stereo- 
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Scopic picture) with the dust density records, 
by density of a swarm of points, by figures, by 
color, orotherwise, attached to their respective 
spatial positions. Numerous illustrations can 
be drawn involving sea-life populations and their 
density in different locations in the ocean. If 
we add seasons of the year such a series becomes 
four-dimensional, with time as the fourth di- 
mension. 

The one-dimensional spatial series is un- 
common. А precise illustration of such is to be 
found ina chromosome map. The location of genes 
in a chromosome is strictly linear. То present 
such a series we require a line, straight or 
curved provided it does not cross itself, with 
the location of the genes located thereon. п 
this simple case, if the line with the located 
genes is not given, but only the names of the 
genes, the spatial series disappears and we have 
nothing but a qualitative series left. 

When we note that debased spatial, quanti- 
tative, and temporal series reappear as quali- 
tative series, we are led to ask if qualitative 
series which are not. recognized as degenerate 
series of the other types would not in fact be 
seen to be such if fuller information were availa- 
ble. We may also note in this connection that 
the statistics, deriving from frequencies and 
proportions in classes, appropriate, to quali- 
tative series are far more primitive in their 
nature, though at the same time more universal 
in their applicability, than the statistics that 
serve a quantitative series. Both because of 
this and because of the loss of information which 
is common when data are merely thrown into cate- 
gories, the student should be content with quali- 
tative data and treatment only in case the data 
cannot be cast into one of the other three types. 

It 1s somewhat difficult to classify as two- 
or as three-dimensional data distributed over 
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the surface of the globe. Since the data all 
lie upon the surface a much more limited type 
exists thanthe general three-dimensional spatial 
series. Certainly if we are concerned with so 
small a portion of the surface of the sphere that 
it can approximately be represented by a plane, 
we have a two-dimensional spatial series. This 
is the most common situation and we shall think 
of geographic series as essentially two-di- 
mensional spatial series. 

In dealing with data attached to the forty- 
eight states of the United States we should 
follow the tabulating practice of the U. S. 
Census. This is shown in the stub of Table IV S. 
If we desire to present the population of the 
States as a geographic series we must represent 
the population figures upon a map in their ap- 
propriate positions, by writing the population 
figures, by varied cross-hatching, by shading, 
ог by stippling. In these last three cases the 
greater the density of population the heavier 
the hatching, shading, or stippling. Shading or 
stippling generally yield the more accurate 
visual impression. 

The population data just mentioned may be 
called a quantitative geographical series because 
different quantities of a single thing are at- 
tached to the different geographical positions. 
Should the chief products of the states be shown 
on а map,—a steer being drawn on Texas, a shock 
of wheat on Kansas, etc., we have a qualitative 
geographic series. This latter is very common 
in popular portrayal and is serviceable if there 
is no attempt to convey other than qualitative 
information. Of course the veriest tyro is aware 
that such a presentation fails to reveal im- 
portant quantitative aspects that exist. 

The most primitive geographic series merely 
shows that different regions exist. Such regions 
are defined by the boundaries given on the map. 
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DIVISION AND STATE 


NEW ENGLAN 
Maine 

New Hampshire 
Vermont 
Massachusetts 
Rhode Island 
Connecticut 
MIDDLE ATLANTIC 
New York 

New Jersey 
Pennsylvania 
EAST NORTH CENTRAL 
Ohio 
Indiana 
Illinois 
Michigan 
Wisconsin 
WEST NORTH CENTRAL 
Minnesota 

lowa 

Missouri 

North Dakota 
South Dakota 
Nebraska 

Kansas 

SOUTH ATLANTIC 
Delaware 
Maryland 

Dist. of Columbia 
Virginia 
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TABLE IV S 


CERTAIN POPULATION, AREA, AND SCHOOL ATTENDANCE 
STATISTICS OF THE UNITED STATES, BY STATES, 1940* 


653] 
2, 67 


РОРИЕА=| AREA IN 
SQUARE TOTAL ATTENDING 


MILES 1000's | IN 100015) 


PERSONS OF AGE 16-20 


NO. IN SCHOOL 


80, 009 256 113 44.3 
55, 986 230 98 42.7 
69, 270 331 126 37.9 


1,978 24 9 38.3 
9, 887 169 56 32.9 
61 51 24 46.9 
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TABLE IV S (continued) 


CERTAIN POPULATION, AREA, AND SCHOOL ATTENDANCE 
STATISTICS OF THE UNITED STATES, BY STATES, 1940" 


AREA |N PERSONS OF AGE 16-20 


POPULA- 


pivisiow awo stare 198 senec |170, | "remate 
1000"5 | MILES 1000's |(IN 1000'S 
(CONTINUED) 25 == 

SOUTH ATLANTIC 

West Virginia 1,902 24, 090 200 73 
North Carolina 3,572 49, 142 401 132 
South Carolina 1,900} 30,594 224 69 
Georgia 3, 124 58,518 330 99 
Florida 1,897 | 54,262 170 65 
EAST SOUTH СЕНТВАШ 10, 778 | 180,568 1, 106: 37+ 
Kent иску 2,846 | 40,109 287 81 
Теппеззее 2,916 41,961 293 100 
Alabama 2, 833 51,078 297 107 
Mississippi 2,184 47, 420 228 87 
WEST SOUTH CENTRAL] 13, 065 | 430,829 1, 305 505 
Arkansas 1, 949 52, 725 201 72 
Louisiana 2,364 | (45,177 239 82 
Oklahoma 2,336 | 69,283 236 109 
Texas 6,415 | 263, 644 629 242 
MOUNTAIN | 4,150] 857,836 395 188 
Montana 559| 146,316 52 26 
Idaho 525 82, 808 52 26 
Wyoming 251 97,506 24 И 
Colorado 1,123 | 103,967 102 47 
New Mexico 532 | 121,511 53 22 
Arizona 499 | 113,580 47 19 
Utah 550} 82,346 57 32 
Nevada 110 | 109, 802 8 4 
PACIFIC 9,733 | 320,130 793 422 
Washington 1,736 | 66,977 149 79 
Огедоп 1,090 96,350 92 46 
California 6,907 | 156, 803 551 296 


* Abstract of the Sixteenth Census of the U.5., 1940,p.45 
and 76, Population, Second Series, Characteristics of the 
Population, United States Summary. 
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To make these regions more easily observed they 
may be colored differently. It is not necessary 
to employ as many colors as there are regions, 
for non-contiguous regions may be colored the 
same without confusion. Experience, rather than 
mathematical deduction, shows that if the region 
to be mapped is the globe or any portion thereof 
four colors will always suffice to meet the re- 
quirement that no two contiguous regions shall 
be colored the same. In any actual case it may 
call for trial and retrial to find the proper 
order of coloring to insure that this minimal 
number four suffices. 

Frequently the map with the recorded phenomena 
upon it constitutes the end product, or the 
terminal geographic statistic. However, compu- 
tational techniques are appropriate to the hand- 
ling of important issues that commonly arise. 
When the nearly waste land that is now Gary, 
Indiana, was established as a steel center, it 
was because of its strategic geographic location. 
To be considered was the cost of bringing ore 
from Superior, coal from southern Illinois, and 
of distributing finished products to railways 
and industrial plants in the Middle West and 
beyond. Taking these and other necessary con- 
Siderations, there existed some location that 
would minimize costs and maximize profits. Gary, 
after a careful survey, was judged to be it. The 
problems of geographic series may become very 
complex and they arise in many business and social 
intercourse situations. 

The school superintendent recommending the 
building of a new school should do so primarily 
to minimize the distance and the hazards that 
pupils are subjected to in going from home to 
School. ОЁ the available locations, undoubtedly 
Some one better accomplishes these things than 
any other. The basic treatment of this problem 
requires first of all a plotting of a school 
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population map, as estimated to exist at a future 
date, corresponding to one-half the life of the 
school building to be constructed. Upon this map 
lines may be drawn representing the hazards, e.g., 
more lines for railway crossings and boulevards 
than for residential streets. Then each site may 
be tested by making computations giving the sum 
of pupil distances from school and the sum of 
hazards crossed. If some such combining concepts 
as "one-half mile of distance is equally disad- 
vantageous to crossing one boulevard" are made, 
distance debits and hazard debits may be combined 
and a single figure obtained representing the 
total debit to be attached to a site. А com- 
parison of sites by this means will disclose that 
site having minimal distance and hazard debits. 

The procedure suggested is closely related to 
that of determining the center of population. 
Herewith is a brief table, taken from Read (1939), 
of centers in the United States of certain popu- 
lations. 


TABLE IV T 


THE CENTERS OF POPULATION OF CERTAIN LEARNED 
GROUPS IN THE UNITED STATES 


NEAREST 
LARGE 
CITY 


ORGANIZATION 


College enrollment 39°19’! 84°45 | Cincinnati 


Am. Chem. Soc. by section 40° 5‘) 83949" Dayton 

Ат. Historical Society 399547) 82°13’) Columbus 

Math. Assn. of America 8 39732'| 8P 57'| Cincinnati 

Am. Assn. of Petroleum Oklahoma 
Geologists 98?12'| City 

Ат. Psychological Assn. 83°35'| Columbus 

Am. Assn. of University Cincinnati- 
Professors Dayton 


Am. Assn. for the Ad- 
vancement of Science 
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The method of calculation of such centers fol- 
lowed, with minor modification, the general pro- 
cedure of the United States Census in its compu- 
tation of the center of the United States popu- 
lation, as described in full on page 20, Volume 
II, of the 15th Census of the U.S., and as here 
quoted in abridged form: 


"The center of population may be said to 
represent the center of gravity of the population. 
If the surfaceof the United States be considered 
as a rigid level plane without weight and the 
population distributed thereon, all individuals 
being assumed to have equal weight, the point on 
which this plane would balance would be the 
center of population. . . 


"In making the computations for the location 
of the center of population it is necessary to 
assume that the center is at a certain point 
Through this point a parallel and a meridian are 
drawn, crossing the entire country. 


"The product of the population of a given 
area by its distance from the assumed parallel 
is called a north or south moment, and the prod- 
uct of the populationofthe area by its distance 
from the assumed meridian is called an east or 
west moment. 


"The population of each square degree north 
and south of the assumed parallel is multiplied 
Бу the distance of its center from that parallel; 
a similar calculation is made for the principal 
cities; and the sum of the north moments and the 
sum of the south moments are ascertained. The 
difference between these two sums, divided by 
the total population of the country, gives a 
correction to the latitude. In a similar manner 
the sum of the east and of the west moments are 
ascertained and from them the correction in 
longitude is made." 
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As а simple center of population problem let 
us consider the following, which we will call 
the Problem of the Statisticians' Picnic: Twenty 
Massachusetts statisticians with residences as 
given in Table IV U decide to have a picnic and, 
without prior determination, agree that the site 
shall be the center of population of the twenty. 


TABLE IV U 
PICNICKING STATISTICIANS OF MASSACHUSETTS 
NAME OF PLACE OF NAME ОР PLACE OF 
STATISTICIAN RESIDENCE STATISTICIAN] RESIDENCE 


Adams N. Adams King Gloucester 
Pitt Pittsfield Bridge Cambridge 
Barr Great Barrington | Bean Boston 
Spring Springfield Fern Boston 

Green Greenfield Ivy Boston 

Woo Worcester Rock Plymouth 
Fitch Fitchburg Town Provincetown 
Ames Amesbury Ford New Bedford 


Lawrence | Lawrence Wood Woods Hole 
Lowell Lowell Wells Wellesley 


Where is this Mecca? We solve this problem by 
taking measurements on a map of Massachusetts, 
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Chart IV XVI. We choose some arbitrary point, 
say Wellesley, somewhere near the center of popu- 
lation, and drawa meridian and а parallel through 
it. Assuming a flat surface and using the scale 
of the map and a pair of calipers, we determine 
the north-south and the east-west coordinates of 
each of the twenty residences. The figures of 
Table IV V result. 


TABLE IV V 


NORTH-SOUTH AND EAST-WEST COORDINATES 
__ OF RESIDENCES FROM WELLESLEY 
(N-S [277 


DISTANCE)? DISTANCE)? 


47657 


2382.85 
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V 540.715 = (2.05)? + 2382.85 - (12.05)? = 
152014:220-052: 01 


Computing sums and means we find that the 
center of population of these twenty is 2.05 
miles north and 12.05 miles west of Wellesley. 
Had the faith of the statisticians in their sub- 
ject been wavering, it would have been revived 
as they found their rendezvous,—-a beautiful 
little island in the middle of a fine body of 
water. 

The center of population defined in line with 
the computations made has the important property 
of minimizing the sum of the squared distances, 
ог the quadratic mean distance. For the picnic 
problem we find this to be 


И 540.775 = (2.05)? + 2382.85 - (12.05)?, ог 


52.61 miles. 

Should the desire be to minimize the mean, or 
the median, or the modal distance, other defi- 
nitions and other computational procedures would 
be necessary. Concerning the modal distance we 
may note that if the place of greatest density 
is chosen the number zero distance from the locus 
is maximized, but if the frequency of cases at 
distance x (х Я 0) isto Бе maximized, some point 
(or points) other than this place of densest 
population will in general constitute the so- 
lution. 

If the geographic phenomena studied are con- 
tinuous and show moderate rates of change over 
the region studied, they can be excellently 
portrayed by means of contour lines, as is done 
in isothermal and isobaric weather maps and in 
Coast and Geodetic Survey maps giving land ele- 
vations and sea depths. Military maps of terrain 
illustrate a most precise portrayal of a geo- 
graphic series. 
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Chart IV XVII taken, with permission, from 
Birth Control Review, Vol. 17, no. 4, attempts 
to depict the status ofsterilization laws in the 
United States in 1933. It is not entirely satis- 
factory for the states Shown in black do not 
constitute a uniform class. There is, of course, 
wide diversity in the statutes in these several 


states. 
CHART IV XVII 
STERILIZATION LEGISLATION IN THE UNITED STATES 
(1933) 
| 


SS 


The mapping on a two-dimensional sheet of so 
large a portion of the surface of the globe that 4 
it cannot be represented Satisfactorily by a | 
plane, follows different practices depending upon | 
what feature, or features, it is most important ! 
to portray correctly. 

The Mercator projection of the surface of the 
globe is obtained by circumscribing the globe 
With a thin paper cylinder which touches it- at 
the equator and projecting the points on the globe 
onto the outside of the cylinder, by using lines 1 
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that pass through the center of the globe. This 
projection has the important property of pre- 
Serving all compass directions, as is important 
to the navigator, but areas and distances are not 
preserved and are, inthe arctic regions, greatly 
exaggerated. 

An orthographic projection is obtained by pro- 
jecting the points of the surface of one-half the 
globe onto a plane which touches the globe at one 
point (the center of the map),—the projecting 
lines being perpendicular to the plane. Two such 
circle maps are commonly found in an elementary 
geography to picture the entire global surface, 
but distances and areas are greatly distorted, 
except in the neighborhoods of the two centers. 

Another mapping designed to serve a single 
Spot is an azimuthal equidistant projection. If 
Washington were made the center of such a map the 
rest of the globe would be so presented that all 
distances from Washington would be correctly indi- 
cated by the straight-line distances from Wash- 
ington. Thus all places 1000 miles away would 
lie on a circle of indicated radius 1000 miles, 
having its center at Washington. A dimension at 
right angles to the distance-from-Washington 
dimension is more and more exaggerated as the 
distance from Washington becomes greater, but the 
fact that all straight lines through Washington 
correspond to great circles on the globe through 
Washington is a merit when the issue is distance 
from Washington. A company engaged in inter- 
national trade might well desire such a map with 
its headquarters, or producing plant, as the 
center. 

A gnomonic projection is one constructed with 
the center of the sphere the vantage point, but 
when viewed by the reader the viewpoint is not 
the center, but from points outside the sphere. 
Only a portion of the globe can beso represented. 
The merit of this projection is that for the 
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portion shown the Straight line connecting any 
two points (not merely the center and a second 
point) corresponds to the great circle on the 
globe connecting these two points. 

There are mappings which can be made of all, 
or portions of, the globe so аз to preserve areas. 
One such isthe Aitoff equal-area projection map. 
It is elliptical in shape, with the equatorial 
dimension twice thatofthe inter-polar dimension. 
Another is the sinusoidal interrupted projection, 
which consists of a Succession of lens-shaped 
Parts whpse touching points lie along the equator. 

Though the globe is the unbiased authority for 
spheric relationships, still these sundry two- 
dimensional maps do well serve the special prob- 
lems involving the direction, distance, or area, 
or a particular point of reference. 


SECTION 6. GRAPHIC PORTRAYAL IN THREE DIMENSIONS 


The two-dimensional base is serviceable in 
other than geographic problems. These essentially 
are Situations involving correlated variables, in 
which the frequency of occurrence varies depending 
upon the joint values of two variables. If the 
variables are quantitative we have the common 
Scatter diagram underlying simple correlation. 
If the data are qualitative we have the common 
contingency table underlying coefficients of con- 
tingency and Y? tests of independence of vari- 
ables. In either of these cases it may be 
serviceable to present the data graphically by 
means of a block diagram. 

А rectangular parallelopiped, for brevity here 
called a block, has three dimensions. As it 
Stands upon a flat surface the two dimensions in 
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be drawn in perspective and the vantage point 
from which the blocks are observed must be such 
that at least a part of each block shall be visi- 
ble. If some blockis very low, i.e., the corre- 
sponding frequency small, so that it is a well 
surrounded by towering neighboring blocks, or at 
least touching towering neighboring blocks in 
front of it, the vantage point must be at a high 
elevation inorder to glimpse any portion of this 
block. However, just ashilly country looks flat 
when viewed from a high-flying aeroplane, so 
distinctions in the elevations of blocks are 
difficult if the vantage point is very high. The 
requirements of low vantage point and complete 
visibility are frequently antagonistic. It is 
accordingly true that many data cannot well be 
represented by means of a block diagram. If it 
is possible to envisage every block from a rela- 
tively lowvantage point, it may be expected that 
presentation by means of a block diagram will be 
clear and informative. A very common situation 
arises where one only of the two variables is 
qualitative, the other being either temporal or 
quantitative. In this case the temporal, or 
quantitative, order must be preserved in the 
corresponding dimension, but the order in the 
qualitative variable can be chosen because of 
lucidity of presentation and for no other reason. 
The data of Table IV W as represented in Chart 
IV XVIII, illustrate this case. 

The successive steps in the construction of a 
block diagram: Wewill illustrate these by means 
of the data given. 

The states are listed in alphabetic order in 
Table IV W. Itisnot important to preserve this, 
and to do so would increase the difficulties of 
presentation. An arrangement of states based 
upon the populations in 1930 is suggested,but 
generally a preferable one, to avoid hidden 
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TABLE IV W 


GROWTH IN POPULATION OF THE EIGHT STATES OF THE U.S. 
HAVING THE LARGEST POPULATION IN 1930 


POPULATION IN MILLIONS AT GIVEN DATES 
1890| 1 1930 


1870 910 SUMS 


California 2. 38 9.93 
Illinois 5. 64 19. 64 
Massachusetts 3.37 11.32 
Місһідап 2.81 10. 92 
New York 32.08 
Ohio Ц. 77 17.76 
Pennsylvania 7.67 26.08 
Texas 12. 78 


CHART IV ХУШ 
GROWTH OF POPULATION OF EIGHT LARGEST STATES 


М 


blocks, is an arrangement based upon the "sums" 
column. We will therefore plan to have the tiers 
of blocks from rear to front, give the population 
data for the states in the order: New York, 
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Pennsylvania, Illinois, Ohio, Texas, Massachu- 
setts, Michigan, California. Since there are 
eight states and only four dates, it will make 
for ease of lettering and reading if the axes are 
arranged as shown in Chart IV XVIII rather than 
in the reverse manner. А good general plan is to 
build up by means of sugar cubes, dominoes, or 
other blocks, an approximately correct structure, 
and then survey it from all angles until the 
vantage point is found yielding the clearest 
picture. 

The actual steps showing the necessary con- 
Struction lines involved in the construction of 
the rearmost block are shown in Chart IV XIX. 
In order they are as follows: 

Draw horizontal line AB. 

At some point, О, оп it erect perpendicular CO. 

At some point, D, whose height depends upon 
the height of the vantage point, but is іп general 
higher than the highest block, draw horizon line 
FE parallel to AB. 

On this line choose some point, E, for the 
right vanishing point and draw ОЕ. 

Choose some point, F, for the left vanishing 
point and draw OF. Angle EOF depends upon the 
distance of the vantage point above and in front 
of point O, the greater the height the smaller 
the angle and the greater the distance in front 
the larger the angle. This angle of necessity 
must be greater than 90° and less than 180'. Ап 
angle of 110” is frequently satisfactory, but a 
better estimate of it can be gotten by observing 
the angle in the domino structure when viewed 
from the desired point. 

On OB lay off conveniently equally spaced 
intervals for the right horizontal scale. 

On OA lay off conveniently equally spaced 
intervals for the left horizontal scale, 

On OC lay off conveniently equally spaced 
intervals for the vertical scale, which must be 
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such that the value for the tallest block falls 
below point D if it is desired to see the top of 
this block. The fact that the top of the tallest 
block cannot be seen in Chart IV XVIII is some- 
what of a disadvantage. 

Draw a grid representing the bases of the 
blocks by drawing lines from the division points 
on OB to Е and from the division points on OA to E. 

Draw the most remote block first, erecting 
perpendiculars from the four corners (three if 
the block extends -above the horizon) of its base. 

Erect HG, a perpendicular at G, the front 
corner of the rear tier of blocks. 

On OC mark the proper height (12.59) for this 
most remote block. 

Draw a line from this point (12.59) to Е, cut- 
ting HG at I. 

From I draw a line, ILME, to E, cutting JK at 
L, which is the upper front corner of the most 
remote block. LM is the upper right front edge 
and LN, a part of LF, is the upper left front 
edge. If the block extends above the horizon 
these two edges are all that can be seen, but if 
below the horizon the two rear edges are also to 
be drawn as shown by N'P', a part of N'E, and 
M'P', a part of M'F, had this rear block been 
6.00 high instead of 12.59. 

By similar procedure the remaining blocks in 
the rear tiers are constructed, erasing such of 
the lines already drawn as become covered by 
blocks in front. 

Continue, always first constructing the most 
remote of the blocks remaining. 

An alternative method for obtaining point L is 
to draw OJ and extend it to the horizon line FE, 
which it cuts at О. Draw a line from point 12.59 
on OC to 0. This line will cut JK at L. This 
alternative method is shorter to apply in most 
cases, but it breaks down if Q happens to be at 
the intersection of OC and FE. 
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If any block is completely hidden, it is 
necessary to start from the beginning again, 
using a different order in the rows, a higher 
Perspective, or a different observation point. 

Drawing front edges slightly heavier, shading 
one face of the blocks, and recording heights on 
tops of blocks are frequently advantageous. 

Label in perspective left and right horizontal 
dimensions, add title, and erase all construction 
lines. 

The time scale is continuous and the blocks 
from left to right are appropriately in contact 
with each other, but as the states are discrete 
a diagram with small blank spaces between the 
blocks from front to rear would constitute a more 
accurate portrayal, but it would be much more 
difficult to draw than was Chart IV XVIII. 


CHART IV XX 
GROWTH OF POPULATION OF EIGHT LARGEST STATES 


New York 


. Pennsylvania 


California 


Legend: Population in millions 
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The cross-hatched plat: Analternative method 
of presentation of the growth in population data 
of Table IV W is by means of a plat with stip- 
plings, shadings, or cross-hatchings to indicate 
the different frequencies, as shown in Chart IV 
XX. 


The plat is more flexible than the block dia- 
gram method, but in general it requires a coarse 
frequency grouping and its features are not 
vividly depicted. 


PROBLEMS 


Problem 1. Plot temporal data of Table IV X. 


TABLE IV X 
CHINESE AND JAPANESE POPULATION IN THE U.S.: 
1860 TO 1940* 

CENSUS YEAR CHINESE JAPANESE 
1940 77,504 126, 947 
1930 74, 954 138, 834 
1920 61,539 111,010 
1910 71,531 725057 
1900 89, 863 24,326 
1890 107,488 2, 039 
1880 105, 465 148 
1870 53,199 55 
1860 34, 933 = 

*From the Sixteenth Census of the U.S., 1940 


Problem 2. From 1919-20 data of Table IV Y 
compute the rate of mortality per year. Make 
necessary allowance in computations for the 
unequal age intervals reported. Plot male and 
female rate of mortality curves on same coordinate 
paper. 


174 GRAPHIC METHODS c [4:03] 


TABLE IV Y" 


LIFE TABLES FOR WHITE MALES AND FEMALES IN THE 
ORIGINAL REGISTRATION STATES* 


EXACT. AGE 19104 920 

IN YEARS mE FEMALES 
0 months 100, 000 100, 000 100, 000 100, 000 
| month 95,155 96,213 

2 months 93, 914 95,222 

Ч months 92, 039 93,632 


6 months 92,406 


90, 757 792, 639 


89, 050 91, 070 
86,411 88, 703 
85,321 87, 739 
в, ПТ | 85,573 
82,299 84, 819 
80, 129 82,394 
77, 650 79,615 
74, 899 76, 917 
71,879 | 74,1% 
68,441 71,070 
64, 31 67, 188 


58, 886 


17, 522 
82 8, 962 


87 3,317 
92 829 1,575 
97 129 
102 10 


The original registration states include New England states, 
New York, New Jersey, District of Columbia, Indiana, and Mich- 
igan. 

# Based on the estimated population July 1, 1910 (11,706,221), 
and on the reported deaths in 1909 (160,227), in 1910 1170,2231, 
and in 1911(165,918]. 


m 
Reference: James W. Glover (1923) and Albertie Fouoary (1923). 
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Problem 3. Examine Table IV 7. What seems to 
be the most striking phenomena revealed? Make 
graph to show this accurately. 


TABLE IV Z 
SCHOOL ATTENDANCE IN GEORGIA AND WASHINGTON BY AGE: 1940* 
Georgia Washington 


ATTENDING TOTAL ATTENDING 


TOTAL 
NUMBER 


SCHOOL NUMBER SCHOOL 
7 to 13 years 449,562 413,299 171,555 166, 945 
|4 and 15 years | 128, 916 101,211 54,151 51,675 
16 to 20 years | 330,280 98,567 149, 134 79, 056 
21 to 24 years | 239,448 6, 938 118, 905 9,328 


* Sixteenth Census of the U.S., 1940: POPULATION, Second 
Series, Characteristics of the Population, United States 
Summary, р. 76. 


Problem 4. Choose the most appropriate inter- 
val possible and plot data of Table IV AA as a 
frequency polygon. Suggest an improvement in 
the original data which would have made still 
more comparable the performances of short and 
tall men. 


Problem 5. Plot the data of Table IV AB as a 
percentile graph. We will first make certain 
nearly arbitrary assumptions: (a) That "under 
1000" covers the range "400 to 1000"; (b) that 
"villages not incorporated" cover the range "30 
to 400"; and (c) that "rural farms" cover the 
range "1 to30." A community of size less than 1 
is impossible so a zero percentile of .5 (the 
boundary between 0 and 1) may be plotted. The 
100 percentile should not be plotted, since the 
maximum possible size of community is not deter- 
minable from this sample. Because of the great 
range, 1to 6,930,466, use a logarithmic scale of 
sizes of communities (ordinate). The abscissa may 
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TABLE IV AA 

STANDING HIGH KICK 
Distances Measured to the Nearest Inch 
Above or Below Standing Height (s.h.) 


Score in Inches Frequency 
Above s.h. 
19 | 
18 ! 
17 ! 
16 2 
15 3 
14 2 
13 ! 
12 7 
| 3 
10 9 
9 9 
8 7 
né 12 
6 19 
5 22 
4 18 
3 25 
2 16 
| 10 
0 12 
Below s.h, 

| 7 
2 7 
3 | 
4 2 
5 | 
6 2 
7 2 
8 0 
9 1 

N = 203 


5 Frederick Warren Cozens, THE MEASUREMENT OF GENERAL ATHLETIC 
ABILITY IN COLLEGE MEN, Univ. of Oregon Publication, Physical 
Education Series, Vol. 1, №. 3, April, 1929, раде 188, 
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TABLE IV AB 
URBAN-RURAL DISTRIBUTION OF POPULATION OF U. S. IN 1930* 
SIZE OF COMMUNITIES NUMBER POPULATION 
Cities 
6, 930, 446 al 6,930, 446 
1,000,000 - 4,000, 000 4 8, 134, 109 
400, 000 - 1,000, 000 13 8, 067,471 
200,000 - 400,000 23 6, 508, 600 
100,000 - 200,000 52 6,685,110 
25,000 - 100,000 283 12,917, 141 
10,000 - 25, 000 606 9,097,200 
5,000 - 10, 000 851 5,897,156 
2,500 - 5,000 1,332 4,717,590 
Villages 
1,000 - 2,500 3, 087 4, 820, 707 
Under 1,000 10,346 4, 362, 746 
Villages not incorporated 14,479, 257 
Rural Farms 6, 288, 648 30, 157,513 


* Abstract of the Fifteenth Census of the U.S., 1930 pp. 16, 17, 
18, 19, 21. 

be "Percentage of people residing in communities 
of smaller size than size indicated." Since 13 
classes only are available if percentiles corre- 
sponding to the boundaries of these classes are 
computed and plotted, the result will be appreci- 
ably more accurate than if integral percentiles 
suchasP ,,, Р. 05, Р 10, есс., are computed and 
plotted. 

Problem 6. Plot data of Table IV АС ас- 
cording to the five ways for drawing a graph 
mentioned in Section 3 (IV, p. 140) and compare 
results. Employ method (c) first in order to be 
uninfluenced by (a), (b), (а), or (е). 
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TABLE IV АС 
HEAD LENGTHS OF 1306 NON-HABITUAL CRIMINALS* 


HEAD, FREQUENCY HEAD FREQUENCY HEAD FREQUENCY 
LENGTH LENGTH LENGTH 
210 | 196 66 182 28 
209 ! 195 56 181 17 
208 0 194 83 180 13 
207 6 193 79 179 12 
205 3 192 102 178 7 
205 8 191 96 177 5 
204 13 190 85 176 3 
203 1+ 189 83 175 3 
202 24 188 68 174 0 
201 20 187 55 173 2 
200 30 186 57 172 | 
199 35 185 53 171 р 
198 43 184 43 170 0 
197 56 183 24 

1306 


* Adapted from data given оп раде Ixxv, TABLES FOR STATISTICIANS 
AND BIOMETRICIANS, edited by Karl Pearson, Vol. 1, 1914. 


In employing method (d) we will smooth by re- 
placing each original class frequency by the 
average of the 5 nearest class frequencies, e.g., 
6, the frequency for class 207 is replaced by 

1404643438 
3.6 (——— — ——)  etc., for all classes from 


5 


168 to 213 inclusive. 

Method (e) is very simple for these data be- 
cause they are so nearly normal, but its exe- 
cution may be postponed until later chapters, 
dealing with central tendencies, variability, and 
the normal distribution, have been studied. 
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Problem 7. Using data of Table IV AC, compute 
Pigg 2 Р, ар om dip Poo? 
also the lowest and highest percentiles that are 


Р су» and 


permissible. 


Compute the standard errors of each of these 
percentiles. 


Problem 8. Using data of Table IV №, select 
scales, left, right, and vertical, which are 
quite different from those of Chart IV XVIII, 
construct a block diagram.  "Proofread" for 
accuracy by careful visual comparison of relative 
heights with those of Chart IV XVIII and by 
sighting down lines to see if they do converge 
as required. 


Problem 9. The accompanying problem involves 
plotting contours and interpolating by means one 
degree more refined than is linear interpolation, 
and so as topreserve the frequencies in intervals 
rather than to preserve the frequencies at spe- 
cific points,—class indexes. It is entirely 
feasible to present the data of Table IV AD ina 
block diagram, but they can also be well shown by 
frequency contour lines oh a mat, the horizontal 
and vertical dimensions of which are age and 
grade. Lines for frequencies 100, 1000, 5000, 
10,000, 15,000, 20,000, 25,000, 30,000, and 
35,000 will suffice. 

One may interpolate in the figures given by 
rows, and also again in the figures given by 
columns, to obtain grade and age points for each 
frequency, 100, 1000, etc. Connecting all the 
points for 100 gives a contour line for this 
frequency, and similarly for the other contour 
lines. Linear interpolation for points on the 
contour line for the frequency 100, though not 
altogether satisfactory, may suffice, but it 
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TABLE IV AD 
AGE-GRADE DISTRIBUTION OF CALIFORNIA SCHOOL CHILDREN 
(Compiled by Research Division of the University of California) 


hardly will for the other contour lines for the 
other contour lines for the age and grade group- 
ings are coarse and the frequencies large. Since 
"under six "includes four- as well as five-year 
olds, neither the "under six" frequencies, nor 
any involving them in interpolation will be 
entirely satisfactory. The error in treating all 
"under six" cases as five-year-olds should not be 
great. Since no frequencies are given for kinder- 
garten classes, no interpolated frequency values 
involving grade .5 (middle of the kindergarten 
year) will be satisfactory. Because of the lack 
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of kindergarten data neither linear nor parabolic 
interpolation as here described should involve 
grade.5. Consider the four sequential frequencies 
0, 1203, 31206, and 34609. The five contour lines 
for frequencies of 5,000, 10,000, 15,000, 20,000, 
25,000 will pass between the tabled values 1203 
and 31206. Ілпеаг1у interpolated values would be 
undesirably and unnecessarily crude.  Para- 
bolically interpolated values would yield values 
lying on a parabola fitted to three points, 0, 
1203, and 31206, or 1203, 31206, and 34609. To 
avoid the issue as to which set of three to employ 
and to get at the same time somewhat more trust- 
worthy results, we may fit a parabola to the 
points 1203 and 31206 and as near as possible, in 
the least-squares sense, to the neighboring points 
0 and 34609. The equation of such a parabola is 


1 
Ете (Е et hee Jey) ЫЕ CUP 
1 
а diem cis quic Nile ball cer de [4:04) 


in which &, fos fy, and f, are the four tabled 
frequencies in order and & is a deviation in 
years from 6.0, i.e., a point halfway between 
the class index values for the second and third 
tabled values. From this equation & may be de- 
termined for the needed contour levels, f = 5000, 
f = 10,000, f = 15,000, f = 20,000, f = 25,000, 
and f = 30,000. For example, for the 5000 level 


we have 
5000 = (0 + 10827 + 280854 - 34609) 


+ (31206 - 1203)é 
1 
+ (0 - 1203 - 31206 + 34609)2° 
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yielding & = -.371 or -54.2, this latter value 
being extraneous as it is obvious that the root 
desired will lie between -.5 and .5. The value 
& = -.37l is equivalent to age 6.629, the age for 
grade one at which the frequency is 5000. 

Though this method should give fair results, 
the following may logically be expected to give 
somewhat better values. We first note that the 
area under any stretch of curve [4:04] corresponds 
to the number of cases in this interval. Let us 
find by integral calculus this area from & = -1.00 
to Е = .00. The answer is 1249, which differs from 
the correct value by 46. Also, in the next 
interval the number of cases under the parabola 
is 31252, which is in error by 46. We may avoid 
these discrepancies by fitting a parabola having 
the observed areas for each of the two middle 
intervals and coming аз close as possible, in the 
least-squares sense to having the requisite 
areas in the first and fourth intervals. The 
equation of this parabola differs only in the 
constant terms from [4:04]. It is 


1 
oet РЫН £,)£ 


- 
! 


1 
ЕБС fo Ен odere: [4:05] 


which is fully as simple to use as [4:04]. For 
the frequency level 5000 we have 


1 
5000 = yz (0 +8421 + 218442 - 34609) + (31206 - 1203)5 


1 
+ (0 - 1203 - 31206 + 34609)? 


yielding £ = -. 310 and -.54.2, this latter value 
being extraneous. The value & = -.370 is equi- 
valent to age 6.630, which happens to be negli- 
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gibly different from 6.629 derived by the pre- 
ceding method. 

The determination, using [4:05] of all other 
contour points and the plotting of contour lines 
is left as an exercise. For checking purposes a 
few values for the contour level 5000 are given 
herewith: (age 7.5, grade 3.287), (age 8.5, 
grade 4.322), (grade 1.5, age 6.30 and also age 
9.161), (age 6.5, grade 2.240). In the calcula- 
tion of this last value we have the data 


Grade Grade Grade Grade 
VRS 255 8-5 4.5 


Аве 6.5 31206 1167 34 0 


The contour point for level 5000 lies between 
grade 1.5 and grade 2.5, but as we have no re- 
corded value for grade .5 we cannot use the method 
just illustrated. However, we can derive the 
equation of a parabola, reproducing exactly the 
areas in each of the first three intervals. If 
the origin is 2.00, i.e., a point midway between 
the first and second class indexes, and the fre- 
quencies in these three classes called f,, В, 
and f,, the equation is 


i 2 
rg era 5668 Sf, - HU er T 
21 
+004, 7 25, + EE. eee [4:06] 
For the data in hand this is 


f= 113682 - 300392 + 144532” 


which, when f = 5000, gives Е = .240, corresponding 
to grade 2. 240. 

It is interesting to note in passing that, in 
a situation wherein f}, f,, f, and f, exist, if 
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f as given by [4:06] and f as given by a similar 


equation involving f,, f, and f, are averaged, 


the resulting expression for f is [4:05]. 


Problem 10: Vocabulary Review 


bar diagram 

basal year 

block diagram 
categorical 

center of population 
circle chart 

class frequency 
class index 

class interval 

class limits 
contingency table 
contour lines 
cumulative frequencies 
cycle 

frequency polygon 
geographic 

golden mean 

grid 

growth curve 
histogram 

horizon 

inflexion, point of 
least squares 

linear interpolation 
logarithmic chart 
median 

minor mode 

mode 

normal population 
ogive 

parabolic interpolation 
percentile 
percentile curve 
perspective 


pictogram 
pie chart 
plot 
price index 
pseudo growth curve 
quadratic mean 
qualitative 
quantitative 
relative time chart 
sample 
Scatter diagram 
segmented bar 
semi-logarithmic chart 
skewed curve 
spatial 

standard deviation 
standard error 
temporal 

time chart 

trend 

vanishing point 
zero point, natural 
c (sigma) 
> (capital sigma) 
& (xa) 
X? (chi square) 

Р 
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СНАРТЕВ У 
THE STABLE FEATURES OF PHENOMENA 
SECTION 1. THE QUEST FOR CERTAINTY 


The yogi seeking peace in Brahma, the philo- 
sophical idealist finding reality in his own 
world of concepts, and the mathematician delving 
ever deeper into the abstractions of pure mathe- 
matics are united in abjuring the turmoil of 
human passions and the strife and struggle of 
competitive life and eachis seeking in his sepa- 
rate way to find an anchor, a Rock-of-Ages, ап 
everlasting law of being. The stabilities of 
life that each finds, or thinks he finds, are 
self-satisfying and eternal in the mind of the 
conceiver. Without questioning the existence and 
reality of stabilities of this sort we can note 
that their source is strictly and individually 
psychological, depending from the logical, in- 
stinctive, and habitual mental processes of the 
individual. We may also note that the applica- 
bility of universal and eternal principles and 
beliefsto concrete issues is never unquestioned, 
but in each instance subject to a judgment of 
pertinence. The pertinence has to be established 
by observation or experience and not by pure 
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deduction. The concepts of stability referred 
to, but not their fields of practical import, 
have anonstatistical basis. Let us freely grant 
that this field of life does exist, and that it 
has an importance as judged by different indi- 
viduals from supreme to trivial, and that it 
lies beyond the pale of experimental statistics. 

The need and the search for anchors in life 
is not limited to philosophical idealists. They 
are, perchance, in less need for them than other 
folk, including avowed pragmatists and empiri- 
cists. The stabilities that these latter seek 
differ in three important respects from those 
sought by the former,—they are relative, not 
absolute, they are observationally determined, 
and they serve a finite purpose in an observably 
limited realm. This is the field served by ap- 
plied statistics. 

Thé concept of relatively ultimate durability 
or stability is valuable. Suppose it is reported 
that "Herman Rosenwald is a ten-year-old Jewish 
boy having an intelligence quotient of 130." The 
age item may be reacted to in a fixed manner for 
such a period of time as may be quite sufficient 
for many immediate issues. That his intelligence 
quotient is 130 is more stable temporally than 
was his 10-уеаг-о14пезз. But this concept, 
consequent to some fallible measurement device 
applied at some moment in time, must be thought 
of as only relatively stable and only so with 
reference to certain none too clearly defined 
fields of functioning. That he is Jewish has 
greater temporal stability, but is of variable 
utility if used as an interpretative aid in de- 
limiting mental and emotional conditions and 
activity. Though eachof these items has relative, 
not ultimate, stability, nevertheless each is 
undoubtedly serviceable within limits, which are 
discoverable with greater or less precision. 

Statistical study is primarily a search for 
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features of relative stability in connections 
not known from earlier study to possess them. 
This search should not be indiscriminate. 


SECTION 2. BIOLOGICAL AND SOCIAL STABILITY 


That there is form to life has been proclaimed 
by philosophers both early and late, and that it 
is rooted in biological, psychological, and 
social laws of survival and accommodation has 
been clearly pointed out by Charles Darwin, 
Lester Frand Ward, andothers. The revelation of 
such form is the prime purpose of statistical 
study. Phenomena are conditioned in certain 
inherent ways which, though not immediately 
obvious, are.compelling in their power to limit 
the range of values, and the relative frequencies 
of the different possible values, which can be 
taken. 

If a cow is tied to a stake by astout elastic 
tether, she is constricted in her grazing just 
as really as if tied with a Manila rope. Gener- 
ally speaking, the restrictions upon social and 
biological phenomena are elastic, not rigid, and 
the strengths, or lengths, or existence, of these 
elastic bonds must be inferred by a study of 
tendencies, not of rigid limits. Can we infer 
the restricting effect of the elastic tether by 
observations as to the thoroughness with which 
the turf-has been grazed? We can, but, primarily, 
not by noting the most extreme tufts plucked. 
Such a tuft might have had so succulent an appeal 
that the cow charged for and attained it at a 
distance quite beyond her usual grazing radius. 
By a careful study of the stubble we can obtain 
evidence not only asto the center of the grazing 
area, but also as to the elasticity or vari- 
ability of the restrictive influence. If cow, 
tether, and stake are removed, and stake-hole 
obliterated, we can approximately reconstruct 


188 STABLE FEATURES = [4:06] 


the situation from the evidence yielded by the 
stubble, making close estimates of the location 
of the stake and of the variably restrictive 
influence of the tether. 

Many phenomenological situations are charac- 
terized by а central tendency with elastic bounds. 
An oyster growing a thicker and thicker shell is 
more and more adequately protected from enemies 
but also more and more encumbered in movement. 
There is thus an optimum shell thickness—that 
is, acentral tendency- coupled with decreasingly 
permissible variability from it—that йа, а 
typical pattern of variability. The interest 
rate on a mortgage of certain amount, with 100 
Shares of stock А as collateral, would not be a 
fixed rate if there were many independent place- 
ments of suchamortgage. The lower the rate the 
fewer lenders, and the higher the rate the fewer 
borrowers, with the result that there would be 
а central tendency and a typical pattern of 
variability fromit. Children of a given age are 
distributed throughout a number of school grades. 
They have a central tendency of scholastic ability 
and there exists a central tendency in school 
grade attended. Аз their abilities are farther 
and farther below the average there is greater 
and greater pressure by school authorities to 
lower the school grade attended, and as they 
are more and more able there is greater and 
greater individual effort to secure extra pro- 
motions so as to profit by higher grade offer- 
ings. Again a central tendency and a typical 
patternof variability. Innumerable illustrations 
can be given. 

There is no logical reason for believing that 
Successive increases of .l millimeter in thick- 
ness of oyster-shell above the central-tendency 
value will have a protective influence just 
equivalent in survival terms to the mobility 
benefit of successive decreases of .1 millimeter 
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below the central-tendency value. The detri- 
mental effect of increased shell weight is con- 
ditioned by certain environmental conditions 
(perhaps getting food) which determine the vari- 
ability pattern upon the above-average-shell- 
thickness side of the distribution, while the 
detrimental effect of decreased shel] weight is 
consequent to other environmental conditions 
(perhaps escaping enemies), and this may well 
lead to a different variability pattern upon the 
below-average side of the distribution. The. net 
result is a nonsymmetrical or skewed distri- 
bution. The disinclination of lenders to lend 
at low interest rates is a social phenomenon of 
a different sort from that of the disinclination 
of borrowers to borrow at high interest rates, 
so again we should expect the patterns of vari- 
ability below and above the average interest 
rate to differ. Certainly the school pressure 
to demote the dull is of a different order from 
that of the individual urge to profit by extra 
promotions if bright. 

We conclude that the skewed distribution is 
to be expected in physical, biological, psycho- 
logical, and social life. We may also conclude 
that a knowledge of central tendency, of vari- 
ability, and of skewness,— these last two to- 
gether being à convenient summary of knowledge 
separately of variability below the average and 
above it,—is knowledge of three characteristic 
and quite independent aspects of phenomena which 
yield distributions. An equivalent statement, is 
that these aspects are stable features of phe- 
nomena of a certain type, recurring with slight 
modification from sample to sample. 

Compare the enlightenment which the statement, 
"125-year-old Bessie is inthe low seventh grade," 
carriesto the untutored and to the statistically 
informed auditor. The reaction of the first may 
be "What of it?" and of the second, that "Bessie 
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is bright and ambitious; she and her parents 
have progressively through a number of years 
fought the lock-step regimen of the school; 
and either entered school at an early age and 
was bright enough to profit by it or has been 
doubly promoted twice, for she is two half-grades 
above the average, a status attained by less 
than twenty per cent of her age-group; and her 
scholastic achievement is about equal to that of 
average low eighth-graders." The last of these 
items, and in part the first, depends upon know- 
ledge of a statistic other than the three men- 
tioned,— namely, that children who get extra 
promotions tend to get them to a grade location 
half as far above the average grade for their 
age аз their ability warrants. The statistically 
informed has had knowledge of four stable features 
of age-grade-achievement phenomena that the 
untutored lacked. Furthermore, there are no 
substitutes for this knowledge, for, lacking the 
information inherent in the four statistical 
items mentioned, one would find it impossible to 
reach the same conclusion with equal certainty. 
It of course must, in connection with a deduction 
from these four items, as in the case of a de- 
duction fromany other information, be recognized 
that there is an element of uncertainty in a 
conclusion drawn, e.g., that from the facts given 
Bessie has the average scholastic ability of low 
eighth-graders has a probable error of approxi- 
mately one-half a year. 

Statistics should enable one to see the indi- 
vidual event in a setting and thus to know it as 
only he can who knows the background. The notion 
that statistical study deadens sensibility to 
individuality isutterly false. A person is more 
truly individualized when a given set of items 
about him are seen in the perspective of the 
population to which he belongs than when viewed 
as a unique event. То claim complete individu- 
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alization when viewed as a unique event asserts 
the ignorance of the observer rather than the 
individuality of the subject. 

The balance that biologists repeatedly find 
to exist in the forms of life under a certain 
“environment represents a type of stability dis- 
covered, described, and attested to with a known 
certainty bystatistical study. The concept here 
is not stability within, or of, a single distri- 
bution, but stability of frequencies of oc- 
currence of items of different sorts. The simp- 
lest case exists when two forms of life interact 
existentially. Ecology and parasitology are 
Sciences concerned with stable relationships 
between living things. If a weevil, host to 
some parasite, exists inlarge numbers, the living 
conditions of the parasites are more favorable 
and they multiply to the point of excessive 
killing of their hosts, with the result that 
they must die in large numbers from duress of 
living conditions, withthe result that the hosts 
now multiply, and so it continues until some 
natural balance is reached. Clearly, in this 
situation, the relative frequencies of host and 
parasite are crucial items of information; es- 
pecially informative are these frequencies under 
stable life conditions. 

The first concern of marine biology is perhaps 
the balance in the frequencies of the various 
forms of life in the homogeneous environment 
that the ocean provides. This balance can be 
highly trusted unless some new factor, such as 
man's predatory activities, changes things. In 
a certain social structure there can exist to 
mutual benefit so many milkmen to so many grocers, 
to so many barbers, to so many shoemakers, etc., 
and if some number should happen to be excessive 
it will tend to right itself. This same type of 
phenomena is precisely present in sundry chemical 


reactions. 
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Throughout the entire field of physical, bio- 
logical, and social life we find that the fre- 
quencies, or proportions, in classes tend, under 
stable conditions, to fixed values. The trust- 
worthy discovery of these proportions in classes 
is a fertile field of statistics. It is a phe- 
nomenological stability other than that discussed 
under formof distributions. However fixed these 
proportions are under stable conditions, they 
show typical growth or cyclical features under 
changing conditions. The proportion of people 
engaged in airplane manufacture has hardly at- 
tained such a value that we can look upon it as 
approximately fixed from year to year. For 
comprehension we require fixed reference points. 
If the proportion engaged in airplane manufacture 
does not provide one we look further. We might 
investigate if the rate of change of this pro- 
portion is constant, or geometric, or definable 
in some demonstrably precise terms. We must find 
some anchorage or expect that none but hazardous 
appraisals can be made. The logical need for 
awareness of the stabilities in phenomena is the 
need that statistics serve. 

In studying proportions in classes and seek- 
ing evidence of stability therein it would seem 
necessary that the individuals in the classes be 
in some sense, or to some degree, interdependent. 
A study of the relative frequencies of cotton 
boll weevils andtheir parasites would be profit- 
able, but a joint study of cotton boll weevils 
and citrus tree scales would not. Just as there 
are conflicting and interacting influences at 
work in determining the form of distribution, so 
there are inthe matter of frequencies in classes. 
We are interested in those situations where 
these frequencies do in some sense directly 
interact or are affected by common causes. 

In choosing cases for statistical study itis 
safe to say that the appropriate members should 
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always be competitive or interacting individuals. 
A certain study of the intelligence of hoboes 
and college graduates can be cited. The re- 
lationships reported, of course, had no guidance 
or classificatory importance, and time has shown 
them to have been very misleading to readers 
unaware of the triviality of relationships be- 
tween noncompetitive groups. 

We will attempt to classify the sorts of sta- 
bility possible in phenomena, admitting the 
likelihood of still other sorts not yet dis- 
covered or conceived. 


SECTION 3. TEMPORAL DATA 


Observations over a period of time may show 
(1) continuity of value from one moment to the 
next, (2) a general trend, (3) periodic fluctu- 
ations, (4) nearly constant proportionate re- 
lationships to some other variable, or variables, 
(5) evidences of limits below which, or beyond 
which, values cannot occur, (6) simplicities in 
any of the preceding five when some transfor- 
mation of the time variable is made. Presumably, 
knowledge of the existence of a temporal series 
is direct and initial. That is all that is ne- 
cessary to suggest the examination of the data 
along one or more of the six lines mentioned, 
with the hope of discovering its stable features. 
As soon as a type of stability is discovered, a 
foundation is laid for a study of causes and 
meanings. 

section ч. GEOGRAPHICAL DATA 


Observations which are intrinsically connected 
with position in two-dimensional space may show 
(1) continuity of value from point to neighboring 
point, (2) discontinuity from point to neighbor- 
ing point, (3) geographic patterns of frequency, 
(4) geographic patterns of categories, (5) direct 
or (6) inverse, frequency relationship from point 
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to point, (17) patterns in combination types of 
frequencies, (8) simplicities in any of the pre- 
ceding seven under transformations of space 
Knowledge of the existence of geographical data 
is direct and initial and immediately suggests 
search for stabilities of the eight sorts listed 
and for their causes and meanings. 

Temporal and geographical: Many situations 
proclaim themselves as both temporally and geo- 
graphically conditioned, as for example do farm 
crops, and then stabilities involving all combi- 
nations of the sorts in the two types listed are 
within the realm of possibility. The problem of 
discovery of stable features is greatly comple- 
cated, but the general modes of search are indi- 
cated by the 6 X 8 combinations. 


SECTION 5, QUANTITATIVE AND QUALITATIVE DATA 


Much data may be judged to be relatively 
independent of the time and place of collection 
in the sense that the issues involved are not 
dependent upon them, as, for example, would be 
a candidate’s record upon a Civil Service exami- 
nation. Withdataof this sort we look for stable 
features in the form of single distributions and 
in the relationships of two or more paired distri- 
butions. 

Qualitative series: The fundamental types of 
stability are (1) proportions in classes as 
successive samplings are taken, (2) relationships 
between these proportions, (3) existence for 
these proportions of upper and lower limits other 
than one and zero, (4) relationships of super- 
and subordination between classes,— such re- 
lationships may be invariable or tendential. 

A combination of stabilities of types (2) and 
(4) is revealed in many genetic chromosomal 
studies and similar conditions are to be expected 
in psychological vocational studies. 
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The statistics of sampling, association, and 
contingency are well developed and provide keen- 
edged tools for analysis of qualitative data. 


Quantitative series: For quantitative series 
the fundamental types of stability reside in the 
parameters, or constants, which are definitive 
of a distribution or of distributions in their 
relationships. These are measures of (1) central 
tendency, (2) variability, (3) skewness, (4) 
kurtosis, (5) multimodality, (6) limits, (7) ex- 
cluded values, (8) stability of each as affected 
by principle of sampling employed, (9) stabilities 
revealed as transformations of the scale of 
measurement are made, (10) relationships in all 
these respects of two or more paired series. 

Generally items (1), (2), (3), (4), and (5) 
are of very unequal excellence in the information 
they give about a parent population. Many distri- 
butions are such that a sample of N cases will 
yield trustworthy information about, say, (1) 
and (2) and not about (3), (4), or (5). The 


higher moments H}, Hy» №, etc., have been pro- 


posed as descriptive statistics in connection 
with extremely skewed and leptokurtic series 
when not as justified upon the ground of sta- 
bility as other neglected statistics. No matter 
how neat a statistic may be algebraically, its 
real merit is dependent upon its stability, the 
precision with which it depicts a feature of the 
population. 

Amateurs err in coining and using an odd sta- 
tistic because of some simplicity, real or fan- 
cied, in meaning, irrespective of and in igno- 
rance of its instability. In illustration may 
be mentioned "the percentage of those getting A 
in mathematics who also got А in law" as a meas- 
ure of correlation. 

It must be admitted that professionals have at 
infrequent times erred in using a statistic 
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because of amenability to algebraic treatment 
irrespective of, though scarcely inignorange of, 
its instability. Inillustration may be mentioned 
the use of higher moments Из, Hy, and even ид, 
though other measures such às percentiles and 
proportions in classes could reveal otherwise 
undisclosed though trustworthy features of the 
population. 

Of two alternative statistics, each of suf- 
ficient reliability for samples of the size in 
question, we may well choose the one of greatest 
algebraic simplicity but never employ an un- 
reliable statistic no matter its algebraic sim- 
plicity. 

Quantitative and qualitative: Many situations 
combine the separate issues of the quantitative 
and qualitative series. One combination, well- 
nigh universal, is found when quantitative differ- 
ences inhere in the very trait that has led to 
categorization, if one class of fruit fly is 
labeled "stubby wing" it will nevertheless be 
found that there are different degrees of stub- 
biness in those placed in this class; even in 
that class of people labeled "male" it is found 
that a variability in maleness exists. The same 
issue arises in quantum-theory physics. The 
standard statistical approach to this problem is 
first to abstract all the information possible 
by treatiné the data as qualitative, and then to 
abstract still more information by superimposing 
a quantitative treatment on this. Where such 
added quantitative treatment is not immediately 
feasible, the student should hold the strong 
conviction that he has but partially,— we may 
even say but superficially,— plumbed the issues. 
In the field of psychology the quantification of 
phenomena earlier thought of as qualitative has 
been one of the most fruitful means of advance. 
We may expect it to continue to be so both in 
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psychology and in other fields of physical and 
social science. 

Another common combination ofthe two is found 
when quantitative and qualitative series enter 
into the single problem, as for example is the 
case where a person's intelligence (mainly quanti- 
tative as commonly measured) and his sex (quali- 
tative as commonly measured) affect his fitness 
for some task. Though such bastard devices as 
tetrachoric and biserial correlation have been 
developed to cope with this situation, the student 
cannot help but feel that these measures hold 
but a part of the information necessary for a 
true* solution of the problem. 

The qualification of seemingly quantitative 
phenomena is also a frequently illuminating 
process. The need for it would seem to exist 
where the so-called quantitative measure is 
largely subjective or hastily conceived, as for 
example is the case with measures of "general 
intelligence." 


SECTION 6. COMPLEX DATA 


Only the simpler types of complexity have 
been mentioned: temporal-geographicaland quanti- 
tative-qualitative. However, very real and 
important problems involving all combinations of 
the various series exist. It is, in fact, quite 
arguable that every real problem does involve 
them all, and never one, two, or three only. 
There are two standard approaches to the handling 
of such problems. The generally preferred and 
more convincing one is so to set up experimental 
situations that variability in phenomena due to 
all but one of the types is negligibly small, so 
that the statistics of a single series may be 
employed, The other method employs the analy- 
* Ascertainable "truth" being relative, the word is used to 

indicate higher rather than a lower stage, and not an absol- 

ute attainment. 
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sis of variance techniques and allocates to each 
type the variability attributable to it. This is 
a beautiful and powerful technique, but its merit 
is sufficiently restricted by its necessary 
complexity, when the data are complex, to warrant 
a serious search for experimental set-ups in- 
volving fewer issues. 


SECTION 7. EMPLOYABLE INSTRUMENTS IN THE QUEST 
FOR CERTAINTY 


Correlative with each type of stability is a 
statistic (or are statistics, in the case al- 
ternative methods are available) which meets the 
issue. These become apparent with statistical 
expertness. We give a few examples herewith: 
Temporal phenomena as listed in Section 3 numbers 
(1) and (2) are served by time charts, (3) by 
periodogram analysis and periodic trigonometric 
functions, (4) by ratios and index numbers, and 
(6) by growth curves and other transformations. 
Geographic phenomenon (1) is served by contour 
maps, (2) by spot maps, (3) and (4) by super- 
imposed maps. Qualitative phenomenon (1) is 
served by proportions in classes and their stan- 
dard errors (standard errors are important in all 
connections, but perhaps most strikingly so here), 
(2) by correlations between class frequencies in 
excess of that due to chance, (3) by association 
methods. Quantitative phenomenon (8) is served 
by Lexis’ ratios and Fisher's variance-ratio 
tests, (9) by normalizing data and other non- 
linear transformations, (10) by sundry measures 
of simple and multiple correlation, (9) and (10) 
by multivariate analysis, including mental factor 
analysis. 

This list is in no sense exhaustive, but just 
a start which the student will find it profitable 
to amplify as his knowledge of techniques ex- 
pands. 
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СНАРТЕВ УТ 
MEASURES OF VARIABILITY 
SECTION 1. THE GENERAL CONCEPT OF VARIABILITY 


It is axiomatic that if all measures of a 
series are and must be alike there is nothing to 
investigate or report. The primary phenomenon 
of a seriesisthat there are differences between 
the measures. We will later see that a deduction 
from this is that the measures have a mean (see 
[6:03]). It seems to the writer that the concept 
of variability is primary and that of central 
tendency secondary, but, as will be shown, one 
cannot think of the one in any but the most 
casual manner without being compelled to think 
of the other. They are as wedded as are the 
facts that a right triangle has three sides such 
that the sum of the squares upon two is equal to 
that upon the third, and it has three angles 
such that the sum of two is equal to the third. 
We now turn our attention to certain necessary 
relationships between the measures in a series. 

To assist a student who has difficulty in 
dealing with algebraic symbols, а numerical illus- 
tration of quantities represented in Tables VI A, 
VI B, and VI C immediately accompanies each of 
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these tables. This numerical illustration need 
not be noted unless desired. The illustrative 
series is composed of the following four measures, 
8, 5, 2, and 1, which have a mean of 4, a vari- 
ance of 7.5, and a variance of differences of 20. 

We will, then, start with the most elementary 
concept of variability possible, namely, that 
two measures of a series X, and X, differ. We 
will call the fact that (X, - X,) = d,,, a quan- 
tity not zero, the most elemental or primitive 
measure of variability. If we have a series 
X,, Х,, Х., . . X,, there aremany such measures 
as given in Table VI A herewith, in which the 
null measures of the type (X, - X,) must not be 
counted, but are included because they lead to 
later simplifications. We can let X stand for 


any measure of the series X,, X,, X, . . . X,, 


but when it is necessary to distinguish between 
two such measures we require a different notation 
and shall let X, stand for any of the X's and X; 


any measure not X, and we shall let Х|, represent 
any measure including X;. We clearly desire, as 
a comprehensive measure of variability for the 


TABLE VI A 


ALL POSSIBLE DIFFERENCES FROM ILLUSTRATIVE NUMERICAL 


5 


8-2 5-2 2-2 1-2 
2-1 


8-1 5-1 
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entire series, some function of all possible 
differences (X,-X;). There аге №? differences in 


Table VI A, but N of them, being null, do not 
count. We may thus write down the following 
generalized measure of variability: 


"759 |(х-х |" 
n и 
f T ш 1 ИП S pps 16:01) 
N?-N 
For typographical reasons M when a superscript 
or subscript isin lower case, but otherwise 16.18 
written as a capital. The n°-2 over the X symbol 


indicates that there are №?-2 terms in the sum- 
mation. If nothing is written over the >, the 
reader is to understand that there are № addends. 

This function, f, is worthy of investigation 
for different values of m. There may be beauties 
in measures of variability as yet undiscovered 
and unreported, but the writer has found the 
function rather intractable for all values of m 
except m2. For this case |(X,7X,)| ^7 (X, -Xj)* 
and the function has very simple and remarkable 
properties. The function, in this case, is a 
variance, for, by definition a variance is the 
mean square of quantities whose mean is zero. 
(A standard deviation is the square root of a 


variance.) Since (X,-X;) = -(X,-X,) and for 
every (Ху Ху) in Table VI A thereisin some other 
position а (ХХ), it follows that the mean of 


the differences in Table VI А = 0 and accordingly 
the mean of their squares is a variance. We may 
write 
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The expression Vd stands for the variance of the 
differences and not for V times d. The symbol 
V 13 commonly used and is appropriate, but vari- 
ances enter into formulas so frequently that it 
is advantageous to have them represented by a 
symbol without subscript or superscript, thus 
V(d) is typographically simpler than its identi- 
cal equal V, or os The summation term is ex- 
actly the sum of the squares of the differences 
given in Table VI A. For the first column we 
have the magnitudes given in Table VI B. 


TABLE VI B 
SUARES OF DIFFERENCES IN TLLUSTRATIVE NUMERTCAL.DATA 
FIRST COLUMN OF TABLE VI A 
Их 64-2X8X 8 + 64 
хех 64-2X8X5+ 25 
ох 64 -2Х8Х2 + 4 


6-2X8X1* 1 


UE. 
x 2X X. $ х 
Summing these we obtain the first row of Table 


VI C. The remaining rows being the sums corre- 
sponding to the second, third, etc., columns of 
Table VI A. 


TABLE VI C 
THE DIFFERENCES OF TABLE ILLUSTRATIVE NUMERICAL DATA 
У! А SQUARED 
NX -2X 2X, + 3X? 4Х 64 -2Х8Х 16 + 94 
МХ -2Х,5Х, + 3X? 4Х 25 -2Х5Х 16 + 94 
NX? -2X 3X, + УХ? 4X 4-2X2X16 + 94 


4X 1-2X1X16+% 


. 


NX -2X 3X, + ХХ? 
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The sum of all these yields N 5Х?-2(Х, )?*N 2Xj , 
yielding finally 
2[®М Zx?-(2X,)?] 


Varlance of differences 
between the measures [6:02] 


N?-N in a series 


y 


d 
Аз, by definition, the mean, М, of a series is 
their sum divided by their number, we may write 
=X = NM and [6:02] may be written 

2[5Х2-№ И?] 


Variance of differences 
between the measures [6:03] 
N-1 in a series 


у 


9 


Аз there is no ambiguity the i as a subscript of 
X has been dropped. 

Though no mention of the mean was present in 
the statement of the problem, we find it turning 
up as an essential statistic in 16:03]. 

Computation by formula [6:03] is simple, 
whereas it would be laborious by squaring the 
values of Table VI A. 

We may express V, in different notation. 
The variance of the X's is by definition the sum 
of the squares of their differences from their 
means divided by their number. This follows from 
the easily proven and never-to-be-forgotten fact 
that =(X-M)=0. Thus by definition 


ХХ-и)? 
20291) 
Аз а simple exercise the reader may prove that 


Z(X-M)! = 5Х2-М М? so that 


The variance of a distribution [6:04] 


y 


. 3X?-N И? 
N 
Also by definition 


The variance of a distribution [6:05] 


o = у V тһе standard deviation of а distribution [6:06] 
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We may now write [6:03] thus 


в 
N-1 

The most widely used measure of variability 

is V or its square root ©, but, because of the 

factor N/(N-1) there is not a constant relation- 

ship between V, and V, though it is nearly 

constant if N is large. У, is the preferred 


measure (see [6:08]) because y, = V, but Vv. 


The structure of the symbol У should carefully 
be noted. У stands for variance. Had one the 
true variance, or that for the infinite popu- 
lation of which the М cases are a random sample, 


we would write it V. We never have Y except 
hypothetically, for we never deal with samples 
of infinite size. А bar over a symbol has one 
of two meanings,—it indicates either a mean or 


an unbiased estimate. Thus V is an estimate of 
the population variance. The tilde to represent 
a population value and the bar to represent an 
unbiased estimate or a mean will be used through- 
out this text. 

Though the variance, V, obtained from the 
sample, is not an unbiased estimate of the popu- 
lation variance, the variance of the differences 
given by the sample, V,, is an unbiased estimate 
of the variance of the differences that would be 
found in the infinite population. 

Pow o 222 M Hi meu 16:08] 
in the population 

The unbiased estimate of the population vari- 
ance is a function of the sample variance and 
the number of cases in the sample, thus, 

Unbiased estimate of 


Y ^ AM ү У, population variance [6:09] 
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Letting V and V, be the means of the V's and 
of the V,’s found in an infinite number of simi- 
lar samples, [6:09] becomes [6:10] showing the 
relationships between true measures of vari- 
ability. 

ax LENNON үт 
Wipe ора ЕС И 

Recapitulating the development we note (а) we 
started with the elementary concept that the 
variability of a series of measures is a function 
of all the possible differences between them; 
(b)'that the algebraically simplest function of 
this sort is the one yielding the variance of 
the differences between the measures; (c) that 
the computational procedure for obtaining this 
calls for a new statistic, the mean; (d) that 
V, is 2N/(N-1) times the variance, V, of the 
sample, which is a universally employed statistic; 
(e) that V, itself is an unbiased statistic 
requiring no modification depending upon the size 
of sample employed; (f) that one-half V,, or its 
equal МУ/(№-1), is an unbiased estimate of the 
population variance. 

SECTION 2. DEGREES OF FREEDOM 

Because of the simplicity with which it fol- 
lows from properties of V,, the matter of degrees 
of freedom is discussed at this point. If we 
have three independent measures X,» Хь, X, 
knowledge of one conveys no information as to 
the values of the others, singly or jointly, so 
we have three degrees of freedom. In Table VI A 
there are N? measures. How many of these are 
independent? Obviously we know the exact value 
of the null measures so they provide no degrees 
of freedom. Having X,-X, gives us no information 
as to X,-X,, X,-X,. - .X,-X,, so there are ex- 
actly N-1 degrees of freedom represented by the 
first column. The first measure in the second 
column, Х,-Х,, is the negative of X,-X, and is 
thus not independent. The third measure, K EXE 
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is derivable from measures in the first column, 
for it equals (X-X,)-(X>X,) and is thus not 
independent. Nor are any of the remaining meas- 
ures in this column. Clearly there are no meas- 
ures independent of those in the first column to 
be found anywhere in the entire table so, though 
there are N? measures in the table, there are 
only N-1 degrees of freedom. Also we may observe 
that since there аге №-1 degrees of freedom in 
V,, and since 

ман 5! 


2N Vs 
there are N-1 degrees of freedom in V and also 
in c. 

In the computation of V we had V=(=(X-M)?] / N 
and we noted that this is a biased statistic, 
but if we divide the summation by the number of 
degrees of freedom instead of by М, we have 


У = [XX-M)']/(N-1) which is an unbiased sta- 
tistic. In general, summations are to be divided 
by the number of degrees of freedom, not by the 
number of cases, to obtain unbiased statistics. 

Is the mean an unbiased statistic? M = (X, t 
p + X,. . .Х, ИМ. Since the various X’s are 


У =с 


Х 


independent, there are just № degrees of freedom 
and И is unbiased. We may write 


52 
ИЕН An unbiased estimate of 1 
the population mean [6:11] 


Certain statistical manipulations deduct more 
degrees.of freedom than do others. For example, 
if we take differences of differences we have 
N-2 degrees of freedom, though the number of 
such is N* when we include the null values, and 
it is N'-2N?*N when they are excluded. 

The student of statistics will find that this 
matter is important in connection with small 
samples and in connection with sundry derived 


| 
| 
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statistics wherein algebraic restrictions have 
been imposed upon the data. 


SECTION 3. THE VARIANCE AND THE STANDARD DEVIATION 
OF THE MEAN 


We found that V, and V are interchangeable in 
the sense that having one, and the number of 
cases in the sample, we can immediately obtain 
the other. Thus either may be taken as a measure 
of variability of the data. Just as variability 
of the data is its chief phenomenon, so vari- 
ability of the mean (not its absolute value) is 
its prime characteristic. What matters the value 
of a mean if it is so quixotic that it cannot be 
trusted? We shall now obtain the variance of 
the difference between means and the variance of 
means. If К is the number of samples, we have К 
means and obviously (see [6:07]) 

VOM, М) га [6:12] 
and аз the number of samples approaches infinity 
we may write 


V (и-и,) = 2% [6:13] 
The error in any single mean, say M,, is M,-M. 
We call this A, and write A, = I, Clearly 
А ук tends toward zero, for each M, is an un- 
biased estimate of M. Ву definition of a variance 


k 
‚Е 
їр Дэр X 


ТЕ Г is subtracted ігот each X in Table VI А, 
the values of the differences are unchanged for 


гох, -3)- Ot 2301 = X,-X,.. We now define ха 
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is derivable from measures in the first column, 
for it equals (X. X.)-(Xr X,) and is thus not 
independent. Nor are any of the remaining meas- 
ures in this column. Clearly there are no meas- 
ures independent of those in the first column to 
be found anywhere in the entire table so, though 
there are N* measures in the table, there are 
only N-1 degrees of freedom. Also we may observe 
that since there are №-1 degrees of freedom in 
V,, and since 

У = о? y, 
there are №-1 degrees of freedom in V and also 
in c. 

In the computation of V we had V-[3(X-M)?] / N 
and we noted that this is a biased statistic, 
but if we divide the summation by the number of 
degrees of freedom instead of by М, we have 


V = [XX-M)]/(N-1) which is an unbiased sta- 
tistic. In general, summations are to be divided 
by the number of degrees of freedom, not by the 
number of cases, to obtain unbiased statistics. 

Is the mean an unbiased statistic? M = (X, + 
X, +Х.. ..Х, ИМ. Since the various X’s are 


independent, there are just N degrees of freedom 
and И is unbiased. We may write 


57 
= И An unbiased estimate of " 
И the population mean 16:11] 


Certain statistical manipulations deduct more 
degrees.of freedom than do others. For example, 
if we take differences of differences we have 
N-2 degrees of freedom, though the number of 
such is N* when we include the null values, and 
it is N'-2N?*N when they are excluded. 

The student of statistics will find that this 
matter is important in connection with small 
samples and in connection with sundry derived 
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statistics wherein algebraic restrictions have 
been imposed upon the data. 


SECTION 3. THE VARIANCE AND THE STANDARD DEVIATION 
OF THE MEAN 


We found that V, and V are interchangeable in 
the sense that having one, and the number of 
cases in the sample, we can immediately obtain 
the other. Thus either may be taken as a measure 
of variability of the data. Just as variability 
of the data is its chief phenomenon, so vari- 
ability of the mean (not its absolute value) is 
its prime characteristic. What matters the value 
of a mean if it is so quixotic that it cannot be 
trusted? We shall now obtain the variance of 
the difference between means and the variance of 
means. If К is the number of samples, we have К 
means and obviously (see [6:07]) 

VCH, -И, ) к-г Vi [6:12] 
and аз the number of samples approaches infinity 
we may write 


Van) = 2 ў [6:13] 
The error in any single mean, say M,, is M,-M. 


We call this А, and write Л, = и, -И. Clearly 


КА ик tends toward zero, for each M, is an un- 

biased estimate of Й. By definition of a variance 
k 

2oy M Да 

= VA т 


ТЕ r7 is subtracted trom each X in Table VI A, 
the values of the differences are unchanged for 


[Ot -3)- (X240) = Х,-Х,.. We now define X, а 
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Score as a deviation from the population mean, by 
2 
х = X-M л deviation-from-the-true-mean score [6:14] 


For the first sample of N cases we have 


58 = 
ZHN =A, 


and similarly for successive samples. Since 
Xi E = X,-X; we may substitute in Table VI A and 
continue the development as before, obtaining 
the first equation of Table VI D as the compara- 
ble formula to [6:03]. 

The subscript 1 has been added to indicate 
that measures from the first sample only are 


involved. For the K samples we have the equations 
of Table VI D. 


TABLE VI D 


IVING VARIANCES OF DIFFERENCES B N RES FOR 
K SUCCESSIVE SAMPLES OF N EACH 


(1-1) Vy = 22 хо 
(1-1) V, = 2 Z Xj - 2N A; 


= ee 2 
(11) Y= 2235 - Ww P 


. 


(1-1) -2 и ОЛ, 


We shall now sum these equations, as K approaches 


infinity, first noting three simplifications (1) 


that each of the V, 's is an unbiased estimate of 
мм 


У, and so clearly their average approaches у т 


Е А ов ee D x which 
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approaches NK V, and (3) that Ёс approaches 
KV,» Thus we obtain 


(N-1)K Vd = 2NK У - 2NK V, 
ү, Utilizing the relationships y, 2 9 V and 
у = [W/(N-1) У, we obtain 


True variance error of the теап....... [6:15] 


If in place of the unknown V we use an unbiased 
estimate of it, we have the accompanying highly 
serviceable formula, in which V, is the usual 


and shorter notation for jj. 


y == Variance error of the теап........... [6:16] 
AUN 

It would be more accurate to designate this as 
the "approximate variance error of the mean," 
but as every variance error, or standard error, 
formula based on the observed statistics is ap- 
proximate, we omit the word "approximate," though 
the student should always be aware that a sample 
estimate has been employed in lieu of the unknown 
population statistic. 

Noting [6:12], we may also write 

Variance error of difference 
vara) то иг TM 
N-1 population. 
As the square root of any variance error is a 


standard error, [6:16] yields 


с e. Standard error of the mean... е.а... [6: 18] 
и УД 

Ву a slight extension of [6:17] we have for two 

means, based upon samples of different size and 


drawn from different parent populations: 
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Vi V yariancs error of 
diff between 
VCH, -H,) = gg -g independent means” [6:19] 
1 2 


Knowledge of V, or сү is necessary toan under- 
standing of the ае Those who interpret means 
without knowledge of their standard errors assume, 
albeit unconsciously, thato, is small, forother- 
wise themean is untrustworthy. For every refined 
judgment and for every debatable issue this es- 
sential item must not be assumed, and it need 
not be, for [6:18] is available and simple to 
use. 

The reader may frequently find formulas for 
the preceding standard and variance errors having 
N instead of N-1 in the denominators. This pro- 
cedure is not recommended (for V is a better 
estimate of V than it is of V) though го error 
of interpretation is likely to result for samples 
even as small as N=12. A standard or variance 
error is never needed to more than two signifi- 
cant figures, andaone-significant-figure answer 
usually suffices. 


SECTION +. SAMPLE AND POPULATION MOMENTS 


Since the mean and variance are so mutually 
cooperative in revealing stable features of a 
distribution, let us consider a computational 
method yielding both. The standard method of 
computation provides the basis for getting higher 
moments as well. We define these as follows: 


X(X-P)* 


N The k'th moment from the fixed point P [6: 20] 


P may be any point not dependent upon the values 
of X in the sample. А common case arises when 
P = 0. If, in this case, К = 1, then obviously 
[6:20] yields the mean. In other words, the 
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mean is the first moment from the zero point of 
the scale of measurement of the X scores. If we 
replace P by the variable point M (variable be- 
cause the mean changes from sample to sample) we 
have the k’th moment from the mean, which is 
usually referred to as simple the k’th moment. 
If moments from some point P, other than M, are 
involved, they must always be designated moments 
from P. 
k 
778 Ег k'th moment [6: 21] 


We note that the first moment - 0; that the sec- 
ond moment = V, which may be designated џ,; that 
the third moment, designated »,, equals zero in 
a symmetrical distribution, but not otherwise, 
and thus is a measure of asymmetry; that the 
fourth moment, 4, measures a variability charac- 
teristic unrelated to asymmetry and different 
from V(= u,) in that it is more subject to the 
influence of extreme measures than is V. This 
phenomenon, measured by the quotient 4,/V^ (see 
[7:03]) is called the kurtosis of a distribution 
and is high when the distribution has long tails 
and a pronounced mode. Otherwise expressed, if 
the parts of the curve intermediate between the 
mode and the tails are called the hips, low, 
intermediate, and high hip heights correspond to 
leptokurtic, mesokurtic, and platykurtic distri- 
butions. 

Higher moments beyond the fourth will not 
here concern us, though we will state without 
proof that the variance error of any moment 
depends upon moments up to one twice as high as 
itself. We have already found that the variance 
error of the first moment, the mean, depends 
upon V, the second moment. The variance of V 
depends upon moments as high as дү, and similarly 
for higher moments. Ап unhappy feature of the 
higher moments is their instability, as is im- 
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mediately obvious from the fact that a single 
measure, if extreme, will greatly affect their 
values. 

In moment notation the u's stand for moments 
from the mean, thus и, = 0, a quantity that never 
needs toenter into a formula. However, the mean 
does enter in, and it has been customary to 
represent it in moment notation either as Шу or 


The notation here used is related to that of Fisher (1925 et 
Seq. and 1928) and to that of Yule and Kendall (1937) as follows: 


AS USED BY FISHER 


AS HERE USED AS USED BY YULE 


AND KENDALL 


N n N 
X x Х 
& & 
Arb. or A 
x at times(x-X), x 
at other times X 
И=Й и 
у= 02 = ГА ГА = о? 
ENS 
УТУ 


ао ee Е CE 


It will be noticed that all theoretical, or population sta- 
tistics are here designated by tildes and that Fisher repre- 
sents them by Greek letters. Fisher's Sample k statistics are 
unbiased estimates of the population Kstatistics and are closely 
related to Theile's seminvariants. 
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Ш’ or m,'. We shall avoid these and represent 
the mean by M. 


Except the mean, none of the raw moments are 
unbiased statistics. The accompanying formulas 
show the extentofthis bias for all moments, and 
combination of moments, up to the fourth degree. 
А vinculum over a statistic, or product of sta- 
tistics, indicates a mean value, obtained by 
averaging for very many samples of № each. 


Ги RC RE ames [6:22] 
И? = И? x M EAM [6:23] 
-- Ну 


- 
D 
и 
AS 
* 
xp 
Ч 
w 


ее m qu Ж 
" Мы 55. [6:28] 


y? N N 


УИ „МӨ Ж. Чл Ок See [6:10] 


— ~ N-1? 2, 3N7-5N +3 х 
v у ee ) о 
N3 N? 


[6:26] 
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Mu Tu ue [6:27] 
5 10003 
N-1)(N* -3№+ 3(N-1)(2N-3 
о. в 
4 m 4 2 
N-1) ~ М1 
МУ = —_ИУ+ а: [6:29] 
N N? 
EC CN-I Nei dam ? 
Mv- — NY «——— [owe 14 *(N-3 V7] ° ROS ee [6:30] 
N № 
yc ue DP. ees ОИ. 
Mi йца-----(Дй-302) [6:31] 
N? N? 


It is to be noted that for a normal distribution, 
in which Е 0 апа, = 372, most of the preceding 
formulas simplify greatly. After the student has 
studied correlation he will find the last three 
formulas useful in obtaining the correlation 
between mean and variance, between mean-squared 
and variance, and between mean and third moment. 
. From formulas [6:22], [6:10], [6:27], [6:28] 
and [6:26] we compute the following unbiased 
estimates of the first four population moments: 


Se | 


=M An unbiased estimate of the population [6: 11) 
mean, see 
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TS Ab unbiased estimate of the population 


y = Ш y variance, see [6:09] 
N-1 

Цэ N? An unbiased estimate of the 

т population third moment [6:32] 


(N-1)(N-2) 


2 N 
* (N-1)(8-2)(N-3) 


HE 
An unbiased estimate of the population fourth moment [6:33] 


SECTION 5. CUMULANTS AND k-STATISTICS 


((N? -28*3)5,-3( 28-3)V*] 


For problems involving the combinations of 
statistics (especially ifof the fourth or higher 
degree in X) from different samples, the student 
will find substantial economy in technique and 
thought by employing Fisher's (1928 and 1934) К- 
statistics. The first four k-statistics are 
given herewith, in which р. and ш are as here 
defined and not as defined by Fishet (see foot- 
note Ch. VI, p. 212). 


Fisher's ky [6:34] 


ky =H 
k, E y Fisher's ко [6:35] 
№2 

к, EL y Fisher's Ка [6:36] 
(N-1)(N-2) 

k, N? (Ntl)n, Fisher's k; 
(N-1)(N-2)(N-3) 
-3(N-1)V* [6:31] 


These k-statistics are unbiased estimates of 
Fisher's Карра, or population, statistics Kj; Ко» 
Ks: and «,,called cumulants. 
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The relationships between the first four cumulants 
and population moments are : 


И [6:38] 
Vers к, [6: 39 
й, = к, [6:40 
А ЗК. [6:41 
Some of the merits of cumulants and k-statistics 
can be surmised from the following quotation 
from Fisher (1934, p. 22): "it is to [Laplace 


we owe the principle that the distribution of a 
quantity compounded of independent parts shows a 
whole series of features —— mean, variance, and 
other cumulants — which are simply the sums of 
like features of the distributions of the parts." 

In an important sense all the moments beyond 
the first are measures of variability, but they 
measure quite different phenomena and we shall 
follow the custom of referring to the standard 
deviation, the variance, or some comparable 
measure, when speaking of variability, and employ 
the terms skewness and kurtosis for such special 
aspects of variability as are revealed by asymme- 
try and height at the hips. 


SECTION 6. THE COMPUTATION OF THE MEAN AND OF 
MOMENTS 


Let us now consider computational methods for 
getting the moments. Raw scores, designated X- 
scores, and deviation-from-mean scores (sometimes 
abridged to "deviation scores"), designated x- 
Scores, and standard scores, or deviation-from- 
mean-in-terms-of-the-standard-deviation Scores, 
designated z-scores (z-x/c), have their respec- 
tive merits. (Standard scores are designated 
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X-scores in normal probability tables, for z is 
there used to indicate the ordinate. See [8:23]. 
Final outcomes must be expressed in terms of 
Scores because they are the observed measures. 
Algebraic derivations are frequently simplified 
by employing x or 2 scores. None of these well 
serve most computational needs, for the benefits 
of grouping and employing non-fractional values 
are not attained by them. We therefore employ a 
new variable, £, which is a score as a deviation 
fromanarbitrary origin in terms of the grouping 
interval, i, employed. We have 


5 Х ~ МБ. о 
1 
or X = Arb. or. +16 


Relation between Х алд £ scores. [6:42] 


If Mz is the mean of the & scores, we also have 


И = Arb. or. Pd Wy Pestis ИЕ 


iabilit ted 
V = i'V, alsoo = iog ма scores [6:44] 


We also have 
ion bet 
x = (EW) or E == + Mg хам 2 cores. [6:45] 
i 
From [6:45] we easily derive 
The variance 


2 i 
v= itv, = 1 (E -mg ) Piores” [6:46] 


: nu х2 3 | 
из = Pu. = 13 м ЗИ Y + 2Nz [6:41] 
2-р 228 
E (7: Lacey = a 
p, 2121-17-20 [6:48] 
mue 4 
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An important property of the arithmetic mean 
readily follows from [6:46] by considering the 
case in which the grouping interval equals one. 


Then 


immediately proving that 2x^ < 52? whenever Me, 
-the distance from arbitrary origin to mean, does 
not equal zero. That is, the sum of squared 
deviations is aminimum when the point from which 
the deviations are taken is the mean. 

We now have two unique properties of the mean, 
3x = 0 and 5х? = a minimum, which are the cause 
of its algebraic simplicity and its general su- 
periority as a measure of central tendency. 
Formulas [6:43], [6:46], [6:47], and [6:48] are 
desirable work formulas in that the arithmetic 
may be made simple by an appropriate choice of 
the arbitrary origin and the grouping interval. 
If the grouping is coarse there is a systematic 
error in the even moments (but not in the odd 
moments) due to grouping which, for certain forms 
of distribution may be allowed for by applying 
Sheppard’s corrections as given in formulas 
[6:49], [6:50], wherein ,Vand „ш, are the moments 
after applying Sheppard’s corrections. 


Voy 6:49 
5 TO Sheppard's Цо ИЦ! 1 
to the second and 
РУ — Tj* fourth moments 


Sad c ME гуту [6:50] 


Because of lack of universal applicability and 

because they would interfere with tests of sig- 
nificance, it is very desirable to employ such a 
grouping that the corrections are negligible and 
may therefore be omitted. Sheppard’s correction 
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in o amounts to one per cent when i = .49 o and 
the correction in р, is one per cent when i is 
approximately .25 c. Аз, for most of the purposes 
of social statistics, the standard deviation is 
the crucial measure of variability and a one per- 
cent errorin this measure is immaterial, we will 
in general endeavor to group for computational 
purposes so that i is not greater than «ЭГ CS 
which rule is substantially equivalent to one 
calling for 12 or more classes. The reader will 
appreciate that in situations calling for more 
than common precision and in situations wherein 
the argument depends from higher moments than V, 
either a finer grouping than this should be em- 
ployed or, in cases not involving tests of sig- 
nificance, Sheppard's corrections employed. 

The computation of the first four moments for 
the temperature data of Table IV J, i = 3° column, 
is shown herewith. 


TABLE VI E 
THE COMPUTATION OF M, V, Hy, AND My 


Y E Е fE fE? £e fé! 
99. 2 5 10 50° 250 1250 
96 2 4 8 32 128 512 
93 о 3 
90 І 2 2 4 8 16 
87 6 | 6 5 28: 6 
84 13 0 (28) (892) 
814023 -l -23 23 -23 23 
78 5 -2 -10 20 -40 80 
75 6 -3 -18 54 -162 486 
ори -4 -4 16 -64 256 
69 | -5 -5 25 -125 625 
802242 -6 -12 72 -432 2592 
62 (-72) 302 (-846) 5846 
-46 -454 
N p zz? 523 581 
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Ve = у 70 = 4.3205, and oz = 2.0786 
28 х 
шла КЕ о = 2.7024 


523 2 
м но СЯ 3 NE = 87.7375 


Since the interval = 3° we obtain 


И = АгЬ.ог. + iM. = 81.7742, which, as ех- 
plained later, should be published аз 81.8. 


V = i? VE = 38.8845, published as 39. 

g = io; = 6.2358, published as 6.2. 

Ву? іш = 12.9648, published as 7 X 10. 

H= i'ag = 7106.7375, published as 7.1 X10? 


It is seen that this method of moments, using 
grouped data and an arbitrary origin, has in- 
volved quite simple arithmetic. Also the work 
involved in getting the variance is directly 
contributive in getting the higher moments. 
Though this method is universally serviceable, 
some prefer a method based upon summations, for 
adding machines are more frequently available 
than multiplying machines. We illustrate a sum- 
mation method, Elderton (1905-6), in Table VI F 
which, though involving larger figures than the 
method of moments, is applicable to grouped data, 
involves positive summations only, provides a 
check upon allsummations as values appear twice, 
involves only such additions with totals and 
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subtotals as can be printed on an adding machine 
tape, so that the large numbers involved have 
constituted little mental tax. 


TABLE VI F 


THE COMPUTATION OF M, V, из, AND дү», BY MEANS 
OF SUMMATIONS 


X Е НЫЯ 5 5 5 


o — ло m DN o 


A brief examination of Table VI F suffices to 
reveal the mode of computation. The various 
summations from the arbitrary origin (here 66) 
are immediately obtainable from S,, 5,» S5; and 
S,, from relationships herewith 


DES 65. [6:51] 

ser? = 5 [6:52] 
2 

[6: 53] 


та 2 655 


эе 68, 65 [6:54] 


222 MEASURES OF VARIABILITY -[6:54] 


Having the various sums of powers of 2" substi- 
tution in [6:43], [6:44], [6:46], [6:47], and 
[6:48] yields the identical statistics already 
obtained by the method of moments, Table VI E, 
The summation method is well adapted to Hollerith 
and Powers machine computation. 

А caution is necessary in using this method, 
or any method, involving large positive and nega- 
tive quantities whichnearly cancel. For example, 


Шз! = 216.2258 - 504.2653 + 290.7420 = 2.7023, 


which is a five-figure answer though the addends 
are of seven figures. То secure a computational 
accuracy to five figures in the final answer bas 
required computational accuracy to seven figures 
in the parts 


SECTION 7. DECIMAL PLACES TO BE KEPT IN PUBLISHED 
RESULTS 


It is absurd to publish as many figures in 
final answers as have been here employed in the 
illustrative computations. The means of the 
first three temperatures of Table IV B, 80°, 88°, 
and 74°, is 80.666666. . . ad infinitum. The 
best evidence that we have from these three 
observations as to the population mean tempera- 
ture is this answer kept to an infinite number 
of decimal places. Any rounding off whatsoever 
e.g., 89.6666667, constitutes a deviation from 
the best possible answer yielded by the data 
If, however, the confidence to be placed in the 
best answer (which is very small because only 
three observations have been utilized) differs 
from that to be placed in a rounded-off answer 
by a scarcely sensible amount, the rounded-off 
answer serves every purpose. Furthermore, if the 
rounding-off is done just at the decimal place, 
where confidence should flag, the rounded-off 
answer is more informative than the non-abridged 
answer. Following this principle, the mean of 
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the three measures should be published as 81°, 
implying that there is doubt as to the trust- 
worthiness of the units figure, i.e., of the last 
figure published. 

There is serious doubt as to any magnitude as 
small аз one-third a standard error,—the chances 
of a magnitude as great as this arising merely 
as a matter of chance are 74 in 100. We there- 
fore adopt аз a practical rule the practice of 
terminatiné a published statistic with the deci- 
mal place given by the first figure of one-third 
its standard error. This rule, which we shall 
call the "one-third sigma rule", is equally ap- 
plicable to original observations and to derived 
statistics. To apply it we must estimate stand- 
ard errors where we cannot compute them. For all 
the more common and more important generalizing 
statistics formulas giving standard errors are 
available. 


section в. VARIANCE ERRORS AND STANDARD ERRORS OF MOMENTS 


Since a standard error is the square root of 
a variance error, it suffices to give formulas 
for variance errors 


Since the variance of the variance is (КҮК 
we may utilize [6:26] and [6:10] and obtain 


Variance of variance 
(samples of any size 


y, == суга), - cae ЭЎ parent на [6:58] 


M tions of any form) 
N3 


Since the sample ш; and V? are unbiased esti- 


23 A 172 x 
mates of р, and V^ and not of Ё, and У’, an an 


i m y? i an 
swer in terms of №, and V^, for which we c 


substitute дү, and V?without introducing any 
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Systematic bias, is desirable. We can derive 
such a formula by utilizing [6:10], [6:26], and 
[6:28]. The final result is 


= INTEL, c 2 =. буд 2 
Ок M (С1-1)5д, -(W?-3) V°] 


Variance error or sample variance, samples of any size [6:56] 
and parent population of any form $ 


When № is large this reduces to 


= 1 2 Variance error of 
у, oy (784 ) sample variance, [6:51] 


N large 


When the parent population is normal [6:56] re- 
duces to 


2y? Variance of variance 
E when population is пог- [6:58] 
i N+1 mal see (13:147) 


In Chapter XIII, Section 8, the variance of 
the standard deviation is derived. It is 


у Variance of c, 
2 Zu N large [6:59] 


When № is large and the population normal this 
leads to 


Standard error of c, 


Ge e N large and popula- 16: 601 
c tion normal 
V2N 


Small sample formulas for У, and У, become 
3 4 


complicated, but the following given by Pearson 
(1902 —1903 and 1914), are frequently serviceable 


EIN м ND з Мс э Variance of 
VERS N НО НАЯ Hy) Hoa N large [6:61] 
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From which we obtain, approximately, 


Variance error of n 
moment,-N large and 6:62] 


6 
Үнэ y? population normal 


EUN S Lo e Ы 
Vu у Он = py ot LOY, 25 7 85545,) 
Variance of Hys large [6:63] 


From which we obtain, approximately, 


Variance error of 
V 2136 y* fourth moment,-N large [6:64] 
Hy N and population normal 


We may use these formulas to estimate the 
variance errors, and, by extracting the square 
roots, the standard errors, of the several sta- 
tistics computed from the data of Table VI E, or 
Table VI F. 

Ву [6:18] we find that c,7 .79 and since one- 
third of this = .3, the mean is published as 
81.8, indicating that the .8 is in doubt though 
it is better than a random guess. 

By [6:56], which is the master or most trust- 
worthy formula we have for V,, we find V, = 94 
or c, = 9.7. Also, utilizing [6:60] we find 
o = .78. We accordingly publish V as 39. and с 
as 6.2. 

It is informative to see what the approximate 
formulas [6:58] and [6:60] yield. This latter is 
an important formula in that it does not involve 
the third or fourth moments and is widely used. 
By [6:58] we obtain, to two figures, the same 
answers as before, suggesting that [6:58] is 
satisfactory if № is as large as 62. Knowledge 
of the first significant figure of a standard 
error usually suffices to reveal the trust that 
can be placed in a statistic. Formula [6:60] 
yields о = .56. This value, though definitely 
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too small (because the assumption of normality 
was unsound), is nevertheless much to be preferred 
to no estimate at all as to the trustworthiness 
of c. 

Not having fifth, sixth, and eighth moments 
we cannot substitute sample moments for the un- 
known population moments and compute the variance 
errors of the third and fourth moments by the 
more accurate formulas [6:61] and [6:63]. We 
shall use the short formulas [6:62] and [6:64] 
knowing that resulting values will be much too 
small. By [6:62] we obtain Vu, = 5783. (to 


be published as 6. X 103, thereby revealing un- 
certainty as to the first figure), о, = 76. 


(=8. X 10), ando, /3 = 25. Though Pse standerd 


error is undoubtedly much too small, it is still 
probable that the first figure of и, is better 
than a guess, so we publish it as 7. X 10. By 
[6:64] we obtain V, = 3597885. (= 4. X 105298 
од, = 1897 (= 2. X 102), and о, /3 = 632 (= 6 X 
Ч 4 
10?. Though this value be too small, it may be 
that the first two figures of p, have some merit, 


so we publish it as 7.1 X 102. 

The unreliability here found for the higher 
moments is somewhat larger for these leptokurtic 
data than would be the case with a sample of 
equal size drawn from a normal population, butin 
general the higher moments are unreliable, so 
where possible crucial issues should be made to 
depend from statistics that do not involve them. 

With reference to the reliability of esti- 
mates of the population variance and standard 


deviation, we note that since ў = [n/N DIV and 
y= УМ/СИ-П e the standard error of V can be 
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éotten by multiplyiné that of V by N/(N-1) and 
the standard error of © can be боёёеп by multi- 


plying that of o by VN/(N-1). 

Treating the mean, variance, third and fourth 
moments in the same general situation has enabled 
a certain survey of types of statistics. Though 
we shall return again to the most important type, 

measures of simple variability of distributions 
and of derived statistics, the student should 
now have the basic concepts of variability and 
in particular of standard error, which will en- 
able him to appraise the different measures of 
central tendency treated of in Chapter VII. 


SECTION 9. THE AVERAGE DEVIATION 


Though the standard deviation (or its square, 
the variance) is the most serviceable measure of 
dispersion because (1) of the importance and 
simplicity of the algebraic relationships in 
which itis found and because (2) it is generally 
the most reliable, there are many other measures, 
some of which have genuine merit for one reason 
or another. 

The average deviation is nearly as reliable, 
in most situations, as the standard deviation, 
and as it is more readily computed it may be 
serviceable if the data are so extensive that 
the labor of computation is a decisive factor. 
It lacks the simplicity and properties that endear 
the standard deviation to the statisticlan, so 
when used it should be as a terminal statistic, 
—one not used in further combinatorial processes. 

The average deviation is defined as the arith- 
metic mean of the absolute values of the devia- 
tions of the measures in a series from their 


mean, Thus 
=|x-m| Definition of [6:65] 


жас Average Devia- 
N tion 


228 MEASURES OF VARIABILITY [6:66] 


If the absolute values of deviations are taken 
from some point, P, other than the mean, the 
resulting value must be labeled "average devia- 
tion from the point P." 


Х|Х-Р| 
ХУЛ де сэлтээ ИНИ АИ [6:66] 


We have 
x&P x2P 
SGA, Se Р-Р) 


The superscript "X<P" indicates that the summa- 
tion covers all values for which X is less thanP 
Values of X =Р may be thought of as located in 
either of the right hand member summations, for 
these null differences will not affect the out- 
come. 


Х<р xSP xSP x2P 
z|x-P| = > (P-X)* > (P-X)*X (Х-Р) + > (Х-Р) 


Х<Р 
p Ha 9e ОВ 


Х<Р 
2 ЕО E NMS NP: 


If the number of measures below P is b, this may 
be written 


Х<Р 
5|Х-Р| = (2Ь-М)Р + М - 2 > X 
So that 
1 Х<Р 
А. D. from Р Toy ((2Ь-\)Р + ZX - 2 > Х][6:67] 


If measures are arranged in a frequency distri- 
bution and their sum, ХХ, gotten on an adding 
machine, it is only necessary to get the sub- 


< 
total at the appropriate point to have UE х. 
The average deviation from P thus is a very 
simple statistic to compute. If the A. D. is 
desired, a single listing, with a few sub-totals 
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in the neighborhood of the anticipated mean, 
will provide the two summations needed to yield 
both the mean and the average deviation. When 
Р = М, formula [6:67] becomes 


2 PEST : 
А. D. EU = X)The average deviation[6:68] 


When P = the median formula [6:67] yields the 
average deviation from the median. This is a 
minimum. Otherwise expressed, the median is such 
a point that the sum of the absolute values of 
the deviations from it of the measures in the 
series is a minimum. The proof of this, which 
is very simple when all the measures in the series 
are different, is left as an exercise. То re- 
capitulate: The basic and unique properties of 
the median are (1) the number of measures below 
it is equal to the number above it, and (2) the 
sum of the absolute deviations from Yt ав а 
minimum. 

The reliability of the average deviation. 
Since the average deviation from an arbitrary 
fixed point is simply the arithmetic mean of № 
absolute values, its standard deviation is given 
by the usual formula [6:18] for the standard 
error of the mean, using the absolute values (Y 
below) as the measures of the series. 

Let X = the original measures of the series. 

Let P = a fixed point, or value of X. 

Let Y - |X-P| л +. 

Then A. D. from P = the mean Y = У. Y = үү 


and standard deviation of 
oy average deviation from [6:69] 
= —— fixed point Р 2 


a.b. from Pi = 


If a point which is a linear function of the 
data, as is the mean, is the point of reference, 
one degree of freedom is consumed thereby. Thus 
we have for the standard deviation of the average 
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deviation: 


Let X = the original measures of the series 


Let 2 = |Х-#| 
Then 
А. D. = the mean Z = Z 
z72-Z2 
and c Standard deviation 
ии ТОЈ 
VN-2 


Though in general the standard deviation is a 
more reliable measure than the average deviation, 
i.e., 0/7, is greater than A.D./c, , ,leptokurtic 
distributions exist forwhich this is not the case. 
In R. A. Fisher (1921) is a basic treatment of, 
"consistent" and "efficient" statistics. 


SECTION 10. BASED UPON PERCENTILES 


The distance between any two percentiles, 
P,-Py+, is an inter-percentile range and is a 
measure of variability, thougha very poor measure 
if the two proportions determining the percent 
tiles are near together. It is also a poor 
measure if either of the proportions is very 
near zero or one, for the corresponding per- 
centile is much less reliable in the case of 
unimodal distributions not arbitrarily bounded 
at an extreme than a percentile somewhat removed 
from zero or one hundred, 

The most reliable interpercentile range is a 
function of the form of the population distri- 
bution. The standard error of P -P ; is deriva- 
ble from the standard deviation of a percentile 
[4:03] and the covariance (as explained in Chap- 
ter X, Section 5) between percentiles. Using 
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cto indicate covariance, р>р', and other symbols 
as earlier defined, it can be shown that 


iq at p Covariance between P 


E n and P г percentiles *[6: 11] 
Р Р 


c (РР!) = М 
Utilizing [4:03] and the formula for the variance 
of a difference [10:106] we obtain 

ра, i? ра! 2igi'p' 
f 10 E fy 


variance of an interpercentile range [6:72] 


) 


The most reliable interpercentile range is 
that one for which (P,-P,1/% -р, 13 a maximum. 
р 


This сап be determined when the form of distri- 
bution is known, and the writer has shown (Kelley 
1921), that, to a close approximation, this is, 
in the case of a normal distribution, the dis- 
tance from the7th to the 93rd percentiles. This 
can be designated the normal optimal interper- 
centile range, but since it has merit for many 
non-normal unimodal distributions such as are 
commonly found, we will give it a general desig- 


nation and call it Pv. 
А meritorious per- 
2 centile measure of [6:13] 
Буян P.957P. от variability 


Letting p = .93 and p = а = .01, the уаг1- 
ance of Pv is given by [6:72] which then becomes 


.0651 m .0651 HE .0098 1,1, 


= — 


+ — 


f? fe Pup 
Variance of error of Pv [6:74] 


A percentile measure of variability that has 
had wide usage is the quartile deviation, or the 
semi-interquartile range, which is commonly 
designated Q. The reader must distinguish be- 
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tween Q and Q,, 0,, and 0,, which аге common 
designations of the lst, 2nd, and 3rd quartiles 
or P. 25» P. 59, and Р ;, respectively. 


Р.15-Р 


2 
The variance error of 0 is given Бу [6:76] 


. 25 


Quartile [6:75] 


deviation 


0 = 


ENS 1875 1 ОТО d a 25 
VL == * - 


манал” f? Р, ХООЛ 
р Р POP 
Variance error of quartile deviation [6:76] 
SECTION 11. COMPARISON OF SUNDRY MEASURES OF 
VARIABILITY 


The relative standard error of Q is generally 
much larger than that of Pv, which in turn is 
generally larger than that of A.D., which in 
turn is generally larger than that of c. The 
relationships of these measures of variability 
ina normal distribution are shown in Table VIG. 


TABLE VI G 


RATIOS OF CERTAIN MEASURES OF VARIABILITY TO THEIR STANDARD 
ERRORS IN THE CASEOF SAMPLES DRAWN FROM ANORMAL POPULATION 


STANDARD ЭТТЕ OF SAMPLE TO 
CRITICAL YIELD RELATIVE 
ERROR RELIABILITY EQUAL 
RATIOS TO THAT OF & FROM 


FORMULAS A SAMPLE OF 100 


[6:60]* TT = 1.4142 ИТ 100 
[6:70] * АР. = 1.3236 УМ 114 
[6:74] * с = 1.1421 УТ 153 
[6:16] * £ = 19573 УХ 212 


m Formulas used, except as modified to fit theoretical normal 
distributions. 
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Table УТ С shows that the average deviation is 
nearly as reliable as the standard deviation, 
that the quartile deviation is a very unreliable 
measure, and that Pv, though the best of the 
interpercentile measures, requires 53 per cent 
more cases than the standard deviation to obtain 
comparable reliability in the case of samples 
drawn from a normal population. The formula for 
the variance of an interpercentile range (6: 72] 
reveals that the 10th to 90th percentile range is 
nearly as reliable as Pv and may well be used in 
lieu thereof in situations wherein the 10th and 
90th percentiles are already available. 

Where considerations of simplicity or im- 
mediate meaningfulness of statistic has led to 
the useofpercentiles it is consistent to employ 
an interpercentile measure as a measure of vari- 
ability, and either Руог the 10 to 90 percentile 
range is recommended. In this same situation 
percentile measures of skewness and of kurtosis, 
as given in Chapter VII are recommended, but the 
statistician will realize that all these per- 
centile measures are terminal statistics and 
very cumbersome to incorporate into further comp- 
utational procedures 


CHAPTER VII 
MEASURES OF CENTRAL TENDENCY 


SECTION 1. THE EFFECT OF FORM OF DISTRIBUTION UPON 
DIFFERENT AVERAGES 


Averages: In common usage the term "average" 
indicates the sum of the measures divided by 
their number, but in statistical parlance this 
is the "arithmetic mean," and any measure what- 
зоеуег of central tendency is an "average." Ac- 
cording to a common definition any function of a 
series of measures is an average of them if it 
equals them in the special case when they all 
have the same value. This definition is so broad 
as not in itself to guarantee useful properties, 
so we will consider a class of averages which 
are more restricted. 

If Ху, X Х, * « « X, are all positive and 
are the measures in the series, a general ex- 
pression giving many averages, but not all possi- 
ble ones, is 


DX UE ^ Generalized 


f(b)- тий [7:01] 


As b takes different values we obtain different 
. sorts of means. When b-l, equation [7:01] yields 


234 
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the most common mean, the arithmetic mean. When 
Ь=-1, equation [7:01] yields the harmonic mean. 
When b=2, equation [7:01] yields the quadratic 
mean.  Thisisnot the standard deviation because 
УХ does not = 0. When b=0, equation [7:01] yields 
an indeterminate form, which can be shown to be 
the geometric mean.* 


* gy definition we have the following for the geometric mean 
= ин 
GM. (X... X) 


ot 


S(log X 
log G.-M. екю. 


we also have 


w +%+%...+% 
= 
ӨС N 


since X? can be expanded into a convergent series, as follows: 


2 
Х -1-*blogX* p OP, higher powers in b 


We can set 


Х = 1 +b log X, as b approaches zero. 


Thus 
(£(b)]* = + log X,* 1 + b log Х,+ . 
* 1 * b log X) 
* y Us D, +b log G.M. (GM) 
Giving 


КЬ) = G.M., as b approaches zero. 
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When b = -9, equation [7:01] yields the smallest 
measure in the series and when b = œ, it yields 
the largest measure. In all cases f(b) increases 
as b increases from -o toco, immediately proving 
that all of the means given by [7:01] are in- 
ternal means, i.e., they fall within the range 
of the measures in the series, and also proving 
that the harmonic mean is smaller than the geo- 
metric mean, whichis smaller than the arithmetic 
mean, which is smaller than the quadratic mean, 
etc. If reasons inherent in, or exterior to, the 
data assert that small measures should play a 
more important part than large measures in de- 
termining the average, it follows that b of 
equation [7:01] should, say, equal 0 or -1, 
rather than 1, —1.е., that the geometric or the 
harmonic mean should be used rather than the 
arithmetic mean. Averages other than those de- 
fined by [7:01] exist, such as the median and 
the mode. 

The averages that we shall consider in some 
detail are the median, the mode, the geometric 
mean, the harmonic mean, and the arithmetic mean, 
which last will frequently be called "the mean." 

The usual order cf trustworthiness of various 
measures of a distribution: For many distribu- 
tions of quantitative data the featuresof great- 
est stability are, in order 

1. The number of cases, №, This is usually 
fixed by the experimenter and thus has no samp- 
ling or experimental error. If the principle of 
sampling does not fix the number of cases, аз 
would be the case if the ages of all the men 
passing a selected spot in a certain hour were 
recorded, then of course there is a sampling 
error in N, but this is not the usual mode of 
sampling. We may therefore generally think of 
the number of cases in the sample as having no 
sampling error. 

2. Some measure of variability, generally the 
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standard deviation. Contrary to the usual un- 
tutored expectation which would place the mean 
first, it is generally a fact that a measure of 
variability has greater reliability. 

3. Some average, generally the arithmetic 
mean. 

4. Some measure of skewness, such as us €, 


(see [6.47], [6:61], and [6:62]), As/o,, (see 
[7:16] and [7:17]), 81/9 , (зее К. Реагвоп, 
Tables), Sk/o,, (see [7: 18] and [7:19]), or 
81/2, (see В. A. Fisher, 1934). 

5. Some measure ofkurtosis, such аз (2,-3)/д, 
(see К. Pearson, Tables), Ки-2,17885/оү, (see, 
[7:08] and [7:09]), &»/0,, (see В. A. Fisher 
1934). 


6. Some measures of multimodality. 

The order of reliability of statistics here 
given is not invariable for it is affected by 
the form of distribution of the data. Certain 
unusual forms may augment the reliability of 
some of the later measures in this list, or of 
other measures not mentioned, while decreasing 
that of some of the earlier measures. Also the 
list is suspect because the basis of comparison 
of the disparate measures has not been given and 
defended. Is the basis that of comparing per- 
centage errors, Or absolute errors, or something 
else? А defense based upon statistical concepts 
thus far presented in this text is impossible, 
so it must suffice to ask the reader to return 
to this issue of the order of trustworthiness of 
different measures as his knowledge about them 
increases. Until he can form an independent 
judgment, based upon his particular data, he is 
advised to attempt so to set up his issues that 
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statistics low rather than high in this list 
provide the requisite information. For example 
suppose the question is, "Are there moremen than 
women of a high level of intelligence?" Опе 
study might compare averages of men and women, 
another compare variabilities of men and of wo- 
men, and another investigate the kurtosis of men 
and of women. Since each of these bear upon the 
issue and since variability is number two in the 
list, the average number three, and kurtosis 
number five, the initial approach suggested is 
via a study of variabilities of men and women. 
Certainly a study based upon proof of multi- 
modality should be undertaken as a sort of last 
resort, that is, only in case no measures lower 
in the list will answer the issue. 

Common forms of distribution: Charts VII I, 
VII II, VII III, show various Pearson-type curves. 
The equations of these curves will not now con- 
cern us. Where two curves are shown under a 
single-type number, oneofthem is a rather typi- 
cal curve of this type and the other an extreme 
curve in one direction of the type in question, 
and the letter near the left or right margin 
gives the extreme curve in the other direction. 
For example, the two category type varies from 
the M (for Mendelian, because of the importance 
in heredity studies of the two-point distribu- 
tion having equal frequencies in the two cate- 
gories) curve wherein are equal frequencies in 
two classes through the curve having unequal 
frequencies in two classes to the limiting case 
where all the frequencies lie in one class. The 
letters M, R (rectangular), N (normal), P (para- 
bolic), E (exponential), and L (straight line) 
stand for curves as illustrated in the first six 
figures of Chart VII I. А11 curves having a 
single mode not at a dictated upper or lower 
boundary are referred to as unimodal, or "i", or 
"A", curves; all having an anti-mode are called 
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SS 
а: : 
129 Bz "9 


е " 


E. Exponential: Type X 


-а 


8-0,6::0 ус» ( 
je fro 


L. Line:Point of Division 
Between Type Ki and Ke 


у«%(!+%) ЫЕ 


p.» 2€, 5,"2.4 Range -ato О 


A. Curve drawn із уялу (F3) Type Ш. See A 
Corresponding Fo A cat is| |X,*0, 0 0, & 23.0 
Slightly Less Peaked Than PointA Curve 
У" У 
aries бот № Fo a Curve 
More LepFokurkic 
than A 


Range from -ос to ec 


x2*0,60, 18 Be (2:0 
ys [ua] 


240 CENTRAL TENDENCY [7:02] 


anti-modal, ог "u", curves, and all having neither 
a mode пог anti-mode except at a boundary are 
called J-shaped, or "J", curves. These Pearson- 
type curves seem to cover pretty well the range 
of the simpler phenomena of distribution. One 
omission which should be classed as simple is 
the point distribution. This consists of fre- 
quencies at discrete points along a graduated 
scale. Аз an example may be mentioned the number 
of children in families,— the frequencies occur 
at 0, 1, 2, 3, etc., but never at fractional 
values. Many point distributions, especially 
those having ten or more classes, may be excel- 
lently handled as one would handle grouped data 
of a continuous variable. Certain cautions will 
be obvious. For example, we may speak of the 
mean number of children per family as 2.30, but 
we hardly should say that themedian or the modal 
family has а fractional number of children. (See 
Elderton, 1927.) 


SECTION 2, THE MEDIAN 


The computation of the median, or the fiftieth 
percentile, Р, ‚ and of its standard error has 
been given in Chapter IV, Section 3. Formula 
[4:03] giving the standard error of any per- 
centile is 


DEL UU T See[4:03] 


For the median p=q=.5 and this formula becomes 


i} WW 
= AERA fM [1:02] 


Cu dn О}? 
Р 


Equation [7:02] may be written 


* Elderton's "twisted j-shaped" type is one in which 


the small tail of the j turns down instead of up. 
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1 
бан = а Д [7:02] 
Р 
35. ур 
iN 
P 
Р 


The quotient is the proportionate density 


"ds 
p 
of frequency per unit interval, and as this is 
found in the denominator we see that the standard 
error of the median decreases as the density of 
cases increases, Accordingly, the median becomes 
a better and better measure as the density of 
cases in the neighborhood of the median increases. 
Though, generally speaking, the mean is more 
reliable than the median, this is not so if the 
distribution is sufficiently leptokurtic. Curve 
A, Chart VII I, is a symmetrical unimodal lepto- 
kurtic curve for which, approximately, the stan- 
dard error of the median is equal to that of the 
mean. This is a curve having greater relative 
frequency both near the middle (which increases 
the reliability of the median) and near the ex- 
tremes (which decreases the reliability of the 
mean) than does the normal curve. Another word- 
ing which applies to unimodal curves, is that a 
leptokurtic curve is low at the hips, positions 
intermediate between the central and the extreme 
portions. A mesokurtic curve, of which the 
normal curve is an example, has medium hip height, 
and a platykurtic curve is high at the hips. 

The more leptokurtic the distribution the 
greater the merit of the median as an average, 
but we must note that a high degree of lepto- 
kurtosis is necessary before the standard error 
of the median becomes less than that of the mean. 
We will shortly provide measures of kurtosis, 
but even without them the reader can get a good 
visual impression of Curve A and then expect that 
ifhisdata are still more leptokurtic the median 
is to be preferred to the mean as a measure of 
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central tendency. 

For all distributions the simplicity of meaning 
of the median andofinterpercentile ranges highly 
recommends them for use with popular audiences 
but only inthe case of highly leptokurtic distri- 
butions do they have superior status because of 
reliability. 

Ап exact and general quantitative relationship 
between the kurtosis of a curve and the relative 
excellence of mean and median is impossible 
because distributions with the same kurtosis, 
[7:03], may be non-identical in other respects 
such as the relative frequency in the neighbor- 
hood of the median. 


Hy Pearson's measure 


8, E of kurtosis [7:03] 


B, = 3.00 for mesokurtic distributions, of which 
the normal is an example. Thus for a test of 
divergence from mesokurtosis we have 


8253 Critical rati idi 
2 cal ratio, prov ng 
= mesokurtosis test based [7:04] 
upon ё, 
8, ? 


Pearson (1914) has provided а table from which 
an approximate and entirely serviceable value of 
А can be found. 


The obvious and crucial test of excellence of 
mean and median is to compute the standard errors 
of each for the distribution in question and 
assert that the one with the smaller standard 
error is the more reliable. Some practice in 
doing this will provide the student with such 
insight that Һе can generally tell which is the 
more reliable measure by looking at the graph of 
the distribution. Equations [7:05] and [7:06 
following are of two leptokurtic distributions 


с обо ———— С 


[7:07] THE MEDIAN 245 


having equally reliable mean and median. In ap- 
pearance each deviates slightly from Curve A of 
Chart VI I. Curve [7:05] is one given by the 
sum of two normal distributions and [7:06] is a 
Pearson Type VII curve. 

in an equation e has an exponent, f(x), 
which is elaborate, it makes for typographical 
simplicity to write "exp f(x)" in lieu of петок 
аз has been done іп 17:05]: 


N x. x? -k х? 


2 
„ш К pr Ta ы 
= 72 s ехр К, ехр 222 [7:05] 


902 
in which К, and k, the roots of 

k2/.5m + V.5m + .5 - V.25 + m71.1320 ог .1146. 
The fourth moment of [7:05] is 


24142 5615 1- (4-1) VERS 2" 
„= tom ap ие = 4 зао 


(1 - VI + 47) 
Thus for this curve Ё, = 4.3333. 


N 
ЕНИ ee ae АЕО 
2 ЯГ 

810105) ee 
2.6862 c 
The Pearson £, is 11.7434, but the standard 
error of 8,, as computed by Pearson's method, 
for this very leptokurtic Type VII curve is in- 
finite. We may use the following as a prelimi- 
nary guide, but not as a substitute for the cru- 

cial test mentioned: 

B, standard suggesting the median as j 

Ё,>5 а та reliable measure than the mean [7:07] 


extreme leptokurtosis, may be based upon the 
quotientofcertain interpercentile ranges. Trial 
suggests that for this purpose the quotient be- 
range than Pv will serve. Calling the 3 to 97 
interpercentile range Tns (initials of "three- 
ninety-seven") we have 
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А similar standard, based upon percentiles, 

and one having a finite standard error even with 

tween a greater and a lesser interpercentile 


Pau P 
2915503 -Ins Percentile measure [7:08] 


= of kurtosis 
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Ku = 
The variance error of Kuis obtainable from 
[7:09] in which р = .97 and p' = .75. 
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“20166 f Variance error of Ku [1:09] 


For [7:05] Ku = 4.60 and its standard error = 
8.27//N . We may use the following as a guide: 


Ku standard suggesting the median as 2 
Ku > 5 a more reliable measure than the mean (7: 101 


In а normal distribution 


Ku = 2.1885 Mesokurtic Ku их ое еа е [7:11] 
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In view of the extreme unreliability of higher 
moments, and even of the standard deviation, in 
hishly leptokurtic distributions we may conclude 
that when the median is preferred to the mean an 
interpercentile measure of variability, such as 
Pv [6:73], is to be preferred to the standard 
deviation, a percentile measure of skewness, 
such as Sk (1: 181. 8тОЛВЕЛПтеїЄллЄо Od] 
[7:12], and a percentile measure of kurtosis, 
such as Ku [7:08], is to be preferred to 8, 
[7:03] + 

The utility of measures of skewness is dis- 
cussed in Chapter VII, 3, in connection with the 
mode. 

In certain problems the selection of measures 
because of high reliability may have to yield to 
selection because of amenability to algebraic 
manipulation. This latter consideration may 
dictate the use of M, V, 44, Шу, ко GHP Gat 


la 
Fisher's unbiased k-statistics, К, (=i), k, (У), 


7 | iE 16) 2 
=“ I es 4 = 2: 
tC. aaan Decay We 


к? k 

2 а Жа 
8:-(81 75-75 Carel fs 0 

к к; 


Summarizing the points noted about the median, 


we have; 

(1) Its simple and direct meaning recommends 
its use with popular audiences. 

(2) Its standard error can be simply calcu- 


lated. a 
(3) Сета terminal statistic апа seldom 


should be used in further computation for its 
algebraic manipulation is difficult. 
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(4) Its reliability increases аз the kurtosis 
of the distribution increases, becoming greater 
than that of the mean for pronouncedly lepto- 
kurtic distributions. 

(5) Though it does not lose its simple mean- 
ing in the case of skewed distributions, its 
greatest utility as a measure of central tendency 
is with distributions which are approximately 
symmetrical. 


SECTION 3. THE MODE 


For distributions which аге nearly symmetrical 
the practical issue generally is whether the 
measure of central tendency shall be the mean or 
the median, or whether both shall be used, and 
for skewed distributions it is whether it shall 
be the mode, the harmonic mean, or the geometric 
mean, or whether one of these and the mean shall 
be used. The mean holds a preferred position in 
both of these general situations because of its 
nice algebraic properties. 

The harmonic and geometric means are limited 
to distributions, generally skewed, wherein zero 
and negative values ofthe variate are impossible. 
No such limitation attaches to the mean, median, 
or mode, but the peculiar merit of the mode as 
the measure of maximum likelihood is most ap- 
parent and informative in the сазе of skewed 
distributions only. We therefore discuss the 
mode in connection with skewness. 

Measures of asymmetry and of skewness: We 
will in this text call a measure of imbalance of 
a distribution a measure of asymmetry if it in- 
volves the units of measurement and a measure of 
skewness if it is independent of them. Thus Ms 
is a measure of asymmetry and ufo? is a measure 
of skewness, for the former does and the latter 
does not change as the units of measurement are 
multiplied or divided by any constant. Pearson's 
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measure of skewness: 


8, 2 Pearson's 8, (7: 12] 


measure of skewness * 
3 
H2 


Tables yielding the standard error of 8; for 
different-sized samples and different values of 
8, and £, are given in Pearson's Tables for Sta- 
tisticians and Biometricians. 

Fisher has devised a very similiar measure of 
Skewness based, not upon the sample variance and 
third moment, but upon unbiased estimates of the 
population variance and third moment. His measure 
is 

Тараг! 
8 22292 Fisher's measure [ ОЗ] 

VE of skewness 

к; 


А general formula for the variance error of g, 
is not available, though frequently needed when 
dealing with clearly skewed distributions. Fisher 
(Statistical Methods for Research Workers, 1925 
et seq.) does provide the variance error in the 
case of a sample drawn from a normal population 
and this, of course, is all that is needed to 
test departure from normality. It is 


уа айса error of 8, 
n І 
В. Г [1:14] 
91 (1-2) (№1) (N*3) Population : 
This formula is recommended not only by its sim- 


plicity, but because it applies to small as well 
as large samples. 


The critical ratios use, and k,/o,_ аге 
3 а 


serviceable as tests of asymmetry. Formulas 


* Not to be confused with Pearson's (Мо-М)/0 which he designates 


as "skewness",—the mode herein having been determined froma 
fitted curve. 
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[6:61] and [6:62] give the variance error of ш 
in case М is large. In the special case in ENG 
N is large and the population normal it can 
teadily be shown that 


3 3 
6k; 6V Variance error of ka, N 
а large and population 


3 N N normal [7:15] 

The Pearson 2; and Fisher g, measures of skew- 
ness suffer from extreme unreliability for ordi- 
nary-sized samples whenever, as is quite common 
in economic and psychological data, skewness is 
associated with considerable leptokurtosis. In 
these situations the simpler tests of asymmetry, 
5/0, and к/о, suffer for the same reason. 


To serve with data of this skewed leptokurtic 
Sort we give the following percentile measures 
of asymmetry and skewness: 


P ZU 
. 07 ә 9 Р il 
о | 
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The variance error of this measure is given here- 
with, wherein р = .93 and p' = .50. 


‚2 ‚2 H у: ‚2 Ч 
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For an abstract measure of skewness we have 


Sk == = Percentile measure of skewness (7:18] 


Pv 


The variance errorofthis measure is given here- 
with 
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:2 22 
i T ә! 
Vex = (руту) + As°V(Pv) + Pv As [pq(— -—) 
jee ven 
144 i Variance of per- P 3 
ОН ЕИ 
f. f 
P q Р 


Except in the case of £}, which is always 
positive, the sign of the other measures, 45, 


в/о, Ку, бү, As, and Sk, is positive and the 


skewness is called positive when the long tail 
of the distribution isto the right. i.e., toward 
high values of the variate. The expression 
"skewed to the right" has been varicusly used. 
It caneasily be avoided, but ifused it 13 recom- 
mended that it be synonymous with "positive skew- 
ness" as here defined. 

We will illustrate these several measures in 
connection with the distribution of Table VII A. 


TABLE VII A 
RELATIVE FREQUENCIES IN A PEARSON TYPE 111, Bises 
B, = 4.5, DISTRIBUTION, FOR .2 о INTERVALS 


x f x f x 
Up to 
-1.8 .0008 .5 . 0562 2.9 .0035 
-1.7 ‚0083 of „0474 3.1 «0027 
-1.5 «0247 „9 .0394 3.3 . 0020 
-1.3 «0450 1.1 „0323 3.5 .0015 
-i.l .06u1 1.3 .0261 3.7 .0011 
2149 „0784 1.5 ‚0209 3.9 .0008 
47 .0868 1.7 .0166 4.1 .0006 
1.9 4.3 
2.1 4.5 
2.3 4.7 
2.5 4.9 
2.7 5.0 
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The various derived statistics herewith given 
have been computed fromthe exact distribution or 
from Salvosa's (1930) tabulation in .01 c inter- 
vals so that an exact agreement with computations 
based upon Table VII A is not to be expected. 
We assume a sample of size 100. 


ОО А. 50 Eph = 1.233; 
1,01 
РЕ. ИР, БЕ 23.854; 
foi 
1. o 1.93 
f 72.989; = 11.033; Pv - 2.854 
.50 .93 


у= 1.00; м, 74.50; V,7.0849; У, = .020; 
As = .358; Sk = .1254; и,=1.00; A, = 1.00 
V,,7.02506; %,=.004128; V, -.3600; V, = .6750 


Critical Ratios 


As Sk а 8 
= 2.26; ——- 1.95; ———=1.67; ——= 1.22 
Vas ШЕТ, Eus San 


The decreasing size of these four critical 
ratios is evidence of a decreasing efficiency in 
As, Sk, ш, and B, as measures of asymmetry. Also 
the less complicated measures As and и. are more 
efficient than the abstract measures Sk and 8,. 
The efficiency of measures changes with change 
in form of distribution (See Fisher, 1921) but 
it seems safe to believe that for a wide class 
of skewed distributions As is more efficient 
than ш, as a measure of asymmetry. It is thus 
recommended for use in a test of asymmetry, but 
of course it is not serviceable for further 
computational work, for its algebraic combina- 
torial possibilities are very limited. 
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Comparison of mean and mode: Аз to how skewed 
a distribution should be to warrant the compu- 
tation of mode rather than, or in addition to, 
the mean depends upon (a) which is the more re- 
liable and (b) whether the distinction between 
them is possible and important for the issue in 
hand. The matter of reliability is accurately 
handled by comparing the standard error of the 
mode with that of the mean. Since there are 
several ways of computing the mode, both it and 
its standard error will depend upon the compu- 
tational method followed.  Pearson's method is 
to fit a cürve with few constants (coefficients 
and exponents) to the entire distribution and 
then determine the mode and its standard error 
from this fitted curve. The merit of this pro- 
cedure for distributions which are found by test 
(a 12 goodness-of- fit test of advanced statistics) 
well to fit a curve which is defined by just a 
few constants is granted, but the writer ques- 
tions its general validity for psychological, 
educational, economic, and social data. In these 
fields the units of measurement are seldom natu- 
ral, intrinsic, or inviolable. They are kept if 
found useful and discarded if not. Furthermore, 
there is generally no guarantee that the units 
which one happens to use retain a similar or 
constant merit throughout low, intermediate, and 
high ranges of values. We commonly, therefore, 
desire a mode which is not a function of the 
entire data, but only of that portion residing 
in the neighborhood of the mode. А computational 
method yielding such a mode, together with its 
standard error, is given later in this section. 
That this computational procedure is simpler 
than Pearson's is not the chief argument in its 
favor, which is that it yields a measure of cen- 
tral tendency which is important in itself, even 
though the entire distribution may not be defined 
and even though the significance of the units of 
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measurement change gradually when passing from 
low to high values. The standard error of the 
mode as thus computed may be less or greater than 
that of a Pearson mode, depending upon the nature 
of the distribution. 

Maximum likelihood: А certain class of sta- 
tistics is known as "maximum likelihood" measures, 
Think of a sample as derivable from any of a 
number of parent populations, all of the same 
type. The probability that the observed distri- 
bution 18 a sample from one of these is greater 
than that it is from any other one, so this 
particular parent may be called the maximum 
likelihood population. Its definition, i.e., the 
numerical values of the constants, or parameters, 
entering into its equation, is obtainable from 
the observed distribution. А parameter so ob- 
tained is a maximum likelihood statistic. 

А unimodal population will yield sample distri- 
butions, and the observed mode of a sample is 
more likely to have arisen if the sample is one 
drawn from a parent with this same mode than if 
drawn from any other parent. Thus the sample 
mode, defining with maximum likelihood the parent 
mode, is a maximum likelihood statistic. This is 
true when theonly assumption about the parent is 
that it has a mode. 

The matter of maximum likelihood can be here 
merely touched upon, but this should suffice to 
inform the beginning student that the modal value 
of any statistic has certain important probability 
properties that are not possessed by any other 
value of the statistic. 

That the mode is a natural point of reference 
is suggested by the simplicity of form that 
certain curves of the Pearson type take when the 
mode is made the origin. The writer has shown 
(1940) that certain invariant properties exist 
for modal age-grade norms that do not exist for 
other norms. Its general and genuine merit is 
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only clouded by the greater mathematical diffi- 
culty of manipulating maximum likelihood rather 
than mean statistics. For the reasons given it 
would seem that the distinction between the mode 
and the mean is an important one to make when it 
can be made with certainty and when an issue of 
central tendency is involved, We cannot relate 
this matter to the degree of skewness of a dis- 
tribution, for a minute skewness is determinable 
if the sample is large enough. The statistical 
establishment of the mode as different from the 
mean is involved whether the mode 18 calculated 
by a fitted-curve method (in which case facili- 
tating tables are given in Pearson's Tables for 
Statisticians and Biometricians) or by the simp- 
ler method here given. It would seem in general 
to be sufficient in the case of unimodal distri- 
butions to anticipate a useful distinction be- 
tween mean and mode when asymmetry, as measured 
by As or j4, is established. : 


The computation of the mode: In Chapter IV 
ап exercise was given in grouping so that the 
mode shown in a graphic presentation would have 
a known dependability. We here will give an 
algebraic method which is an extension of this 
practice. With fine grouping many modes charac- 
teristically appear in the sample and the crude 
mode has little to commend it over nieghboring 
values. With coarse grouping the possible sample 
crude mode values are few and far apart, so а 
close agreement with the population mode cannot 
be expected. The movine-average method of com- 
puting the mode is intended to remedy the defects 
of both fine and coarse grouping. Its process, 
merits, and defects can be illustrated by the 
data of Table VII B. 2 

Certain of these lengths occur several times, 
but none should becalled the crude mode, because 
none is outstanding. We first construct a fre- 
quency distribution, using à fine interval, so 
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TABLE VII B 
HEAD LENGTHS, IN MM., OF 50 SIXTEEN-YEAR-OLD 
DELINQUENT BOYS 


204 204 184 195 184 
183 192 185 193 188 
186 189 187 192 189 
190 201 191 190 194 
181 186 206 179 191 
188 178 187 182 189 
180 187 185 187 203 
182 197 190 194 183 
200 193 196 184 186 


E а 21118077 18. 


small that a computation error of one-half this 
interval, since the moving-average mode must be 
a class indexor in the case of equal frequencies 
in neighboring classes a class boundary, is im- 
material in the light of the problem and the size 
of the sample. We cannot choose an interval less 

. than one millimeter, but this seems sufficiently 
small. We obtain the first two columns of Table 
УП C from Table VII B. 


TABLE VII C 


FREQUENCY TABLE OF LENGTHS, IN MM., OF HEADS OF 50 
SIXTEEN-YEAR-OLD DELINQUENT BOYS, ALSO FREQUENCIES 
FOR DIFFERENT 2, 3, 4, and 5 MM. GROUPINGS 


STANDARD 
mm. 


2 


[7:20] = 


THE MODE 
TABLE VII C ‹(сонт!ниєр) 


FREQUENCY. FOR VARIOUS CLASS INTERVALS 
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The term, moving average, suggests the method 
of computation. For example, for the interval 5 
mm. the first three values are obtained thus: 


12 їв the sum of 2, 2, 3, 29, 3 
15 is the sum of He sale aE 
15 is the sum of Ses 


The first 15 may be gotten by dropping the 2 and 
adding the 5 to the value 12, the second 15 by 
dropping the 2 and adding the 2 to the first 15, 
etc. 

With the 1 mm. interval there is по mode with- 
out a near competing mode, so a solution is not 
reached. Nor is there a solution with the 2 mm. 
grouping. With the 3 mm. interval a single mode 
is found and its value is 188.0. According to 
the usual procedure this would be taken as the 
answer. The coarser groupings are given to be 
illustrative of the uncertainty of the solution. 
No unambiguous mode is revealed with the 4 mm. 
grouping, but the value 188.0 appears with the 
3 mm. interval. Some follow the practice of 
taking moving averages of moving averages, with 
doubtful improvement in outcome. The standard 
error of the mode computed by any of the moving- 
average methods is unknown. 

We will compute the mode by determining the 
maximum point of the best fit parabola to the 
frequencies in the four neighboring classes whose 
sum of frequencies is greatest. The data of 
Table VII E shows that this parabolically smoothed 
mode is a very serviceable approximation to the 
Pearson fitted curve mode. As one would Scarcely 
consider using the mode as a measure of central 
tendency in the case of a flat-topped distribu- 
tion, the failure of formula [7:24] in such cases 
1$ not important. 

The size of the interval employed is of prime 
importance. We will determine the number of 
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classes and the class boundaries according to 
the rules already given in Chapter IV for graphic 
portrayal. These have, in general, been found 
to be meritorious for the purpose of computing 
the mode. We cannot say that the interval given 
by these rules is the best, for what is best 
depends upon the nature of the true distribution 
for the phenomena in question. Investigation, 
for the normal distribution, of the optimal size 
of interval to use in determining the parabolic- 
ally smoothed mode so as to minimize the standard 
error of the discrepancy between the frequency 
at the mode аз given by the parabola and the true 
modal frequency has resulted in the values of 
Table VII D. The computational procedure was 
such that the results are only approximate, 
though the error in the intervals given is cer- 
tainly less than . 1с. 


TABLE VII D 


OPTIMAL INTERVAL, IN TERMS OF THE POPULATION c, FOR 
THE COMPUTATION OF THE PARABOLICALLY SMOOTHED MODE, 
FOR SAMPLES OF SIZE № DRAWN FROM A NORMAL POPULATION 


N 22 30 60 125 200 400 1300 


Size of 
interval Ito AOS ta .8 7. .6 .5 


А comparison with Table XIII Н shows that 
these intervals are considerebly larger than 
there given. Obviously the more leptokurtic the 
curve the smaller the optimal interval when ex- 
pressed in terms of the standard deviation, and 
as typicallv the merit of the mode is associated 
with skewed leptokurtic data, the smaller (though 
still wide) intervals of Table XIII H are to be 
preferred to those of Table VII D. 

Having found a practical solution for this 
troublesome matter of size of interval, we will 
now give the necessary formulas and compute: 
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Mo - the parabolically smoothed mode 


Owo = the standard error of this mode 


f,, = the modal frequency 
апа су, = the standard error of this frequency. 


From the general nature of most non-cuspi dal 
unimodal frequency distributions of continuous 
variables, it is inferred that a second-degree 
parabola can be made to fit closely to these 
curves in the neighborhood of their modes. We 
will accordingly determine theparabola that best 
fits the four points closest to the mode to be 
determined. Let us use an arbitrary scale, called 
a € scale, in terms of which the values of the 
class indexes of the four classes in question 
Bre 1.3, 745.5: апа 1.5 respectively, and let 
the frequencies of these classes be represented 
Буг 215715) f,, and fy. One unit upon this arbi- 
trary scale in € is equal to i units in the 
original data, and the origin of the & scale is 
the boundary between the second and third classes. 
The algebraic relation between X and © is 


AVseArbitrary origin + 4 £ 2. . V. [7:20] 


If the mode is determined in £ units, then its 
value in X units is given by 


Mo,- Arbitrary origin * i Жор лу Us [7:21] 


The equation of the parabola which best fits 
(in the least squares sense) the four points 


(71.5, А), (5.5, R2 CS $s G5. f ia 


1 
dama ll ЕСЕТ 


1 
р ESENE Wh FBS" (1:221 
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The equation of the parabola which best pre- 
serves the areas, in the least squares sense, in 
the four intervals is 


ded 
Fo LG ENTRE NER Dam (Ese. = те, 
1 
аА) unt [1:23] 


which, it will benoted, differs from [7:22] only 
in the constant term. It thus has the same mode 
ast 0:22]. 


The mode, or value of & for which the fre- 
quency is a maximum, is 


E AN В д: 


Mog = in & [7:24] 
£ Sa UE Cm. units 


Accordingly, substituting this value in [7:21] 
yields the answer which we will call the para- 
bolically smoothed mode. If the denominator of 
[7:24] is negative the sample has an anti-mode 
in this region, so we have evidence of bimodality 
and should consider the method inapplicable. 
Substituting Moz for 6 in [1:23] yields the 
modal frequency per i interval 
mns Esau des ci. 
f,per i interval = 12 vi 


vij ME т 


= (Ho)? 11:25] 


4 
Of course 
f per i interval = f per unit 
(їп X) interval eh. шш. эы.» [7:26] 


The variance error of Йо; is approximately given 
by [7:21] 
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ж 


ЕБЕ EE tig ? 2 ) Varlance [7:27] 
V = .2 + MK error of : 
Moz СЕ) S % Mo, * 


E Variance error of parabolically 
и =i y smoothed mode, .........-4(17:28| 
o Mog 


Calculation of the standard error of the parabolically 
smoothed mode 


We have 
о. 
я 
hecho Е Реп 
Taking logarithmic differentlals (see Chapter Xlll, Section 


8) we have 


d io, d Num а Den 


Mc " Num Den 


Squaring, summing, and dividing by the number of samples 
summed, we have 


y V, M 


Mo Num 


2 


C Num, Den 


Moz Num Den? Num Den 


1 = = 
We let—— = Py and а, = lp. Similar definitions sold for 


N 


p's and q's with other subscripts. We also let 
p = (д+р+р + В, 2/4 
Utilizing relationships of the types 
Ve “Ка, and M CUN ва N рур, 
Vias М. 36рза; + .04p,9, *.04p,9,*.36p,G, - 
-24p,p,+. 24p,p,*. T2p, p, +. 0Bp,p,+ 


-24p,p,-. 24рур,) `=. .8\р 
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As a check upon the excellence of the mode as 
given by [7:24]and [7:21] we may compare it as 
thus computed with the value as determined by 
more refined methods. Volume one of Biometrika 
was scanned to find illustrations of data to 
which Pearson curves had been fitted and the 
mode determined from the fitted curves. Articles 
by Powys (1901), Macdonnell (1901), Pearson (1902, 
Sys.), and Fawcett (1902) serve excellently. 
The data by Powys are particularly valuable be- 
cause of the large size of samples. Testing the 
parabolically smoothed mode against the mode 
from the fitted curve for these large samples is 
a severe test so far as systematic differences 
are concerned. Examination of the results as 
shown in Table VII E reveals no systematic error, 
and it further shows that the chance error is 
adequately represented by the formula given, 
[7:28], for the variance error of the parabolic- 
ally smoothed mode. 


FOOTNOTE CONT INUED: 
Ven = Мр, +2595 *P405*B,0,*2P3P;* 2P3P52P3P, 
2 рьрз+2рьв, 2p,p, ) 
i= 40р 
сушп, pen = M 60,0 7 ру» *- 20527. 6p,0,*- 4p. 921 
.8pq,70:0 7.8p;p,—. 4рур,) 
Тооны) 
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Chena ТЫ) 


у: ын (.2 + Moz ) (.2+Ио; ) 
= oF шүлэгтээ eS ee 4 О, 
Т Е = 


:26] 


= 


“42-02 se 


*peioedxe eq о} s 


CENTRAL TENDENCY 


"шш uj [inXS өешөр jo u3peeJg :11e9wej (X) 


weoueszed, 30 әшеб шоу) suoj3eAJesqo 5,9|19141 k (Г) 
*$94ещрооз4 yo Кзүрчлээ4 :uosJeed (1) 

a x a и fseyou, uy fesnzezs 7 (4) 
T(P4EA рче]ўоо$) s]eujujJ9 зо "шу uj y3peeaq рее} :jjeuuopoey (5) 
ep) ч ч ч ч ч u и u ry Gy 
"09-09 u u и u u u и ССА: и (9) 
"04-08 u м ч ч “ 8.18 E Ww ref Py) 
“08-08 sabe 'sjeujujgo ejeu "M*S*N Jo ѕәцоиү иу ‘э4п3е35 :5Амоц (2) 
“06-42 sebe 'sjeujuja9 әјеш *M*S*N уо *sou2u) uj *ean3e3S$ мод 19) 


бү fsjeujwyso ejew SƏLEM uynoc MƏN фо бвөцэш| uj feangeys iskmog (v) 
"ОНО jo azis иу резец 2 St 5:41 

| 921349 әлшпә оў 314 4004 e os у шо gy Jo 2) иецу 48369 46 sy Ty & 

*ydes6 шор peay yx 


“34109 $143 зв 313 4009 e зөзсорри| udejb уо uoj32edsu| 18051 


(13 
3 
Gh 
hh 
oE 


300и 200и 
їч А2 їч A2 


=№910393 -N3n0384 


53408 GSHLOOWS ATIVIIT08VEVE GOHL3W SABND 031114 


83408 03810085 ATTVOITOSVUVd ONY ЗАЗПЭ 031113 30 NOSIYVaWDD 


264 


a TIA XIdVl 


[1:29] THE GEOMETRIC MEAN 265 


A general formula for the variance error of 
the рагабо11са11у smoothed modal frequency is 
involved. It is simple for the special case in 
which the true value of Moz - 0. We then have 


1 
Wf,, per i interval) = —— (f, (N-f,) + 496 (№-Е,) + 
Р 144N 
495,(1-5,) + £,(N-f,) + Mf £t 14f, f; OT 98f, f, 
Ян 14f, f, t 1451, Variance of 10 when Мо, = 0, [7:29 


When Мо; does not equal zero all we can assert 
is that the variance errorof(f,, per i interval) 
is greater than the value given by [7:29]. 


SECTION 4. THE GEOMETRIC MEAN 


The use of the geometric mean as an average 
might be suggested by discovery that, for the 
data in question, it was more reliable than the 
other common averages. However, the determina- 
tion of this may be a very complicated under- 
taking. In ordinary practice the logic of the 
situation rather than an objective measure of 
reliability has been the consideration which has 
led to the use of the geometric mean. Though in 
general we believe this has been sound practice, 
the statistician would be more satisfied if the 
use of the geometric mean were fortified by dem- 
onstrable evidence of reliability. 

Data for which the geometric mean seems ар- 
propriate have the following characteristics: 
(a) The raw measures are all positive and are 
deviations from an indubitable and meaningful 
zero point. (b) Thinking is facilitated when 
relative, not absolute, amounts are kept in mind. 
(c) It follows as a consequence that certain 
tests of consistency (e.g., the time and quantity 
reversal tests applied to index numbers) are 
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frequently satisfied by the geometric mean and 
not by alternative averages. In the following 
discussion the reader should think of the raw 
X scores as fulfilling conditions (a) and (b). 

ПЕРЕХ, Х,, . . . X, are the measures in а 
series, and if zX stands for the product of them 
all, we have, by definition, 


т. [7:30] 
Taking logarithms we have 
logX,* logiae logX, XilogX 
N B WN 


log (G.M.) = 


SE. 
= N = и 


where Yis the logarithmof X. Thus corresponding 
to a geometric mean of the X's is an arithmetic 
meanof the Y's and all the error and probability 
properties attaching to the arithmetic mean can, 
by utilizing the one-to-one relationship that 
exists, be attached to the geometric mean. The 


жер 2 
probability that G.M. (the true G.M.) exceeds 
the obtained G.M. by an amount Л is exactly the 


probability that M, exceeds И, by an amount 8, 
the relationships between quantities being 


2 NIY 
Log G.M. = Муғ log G.M. = ШҮ ОБЕСИ 6-и. ; 
5 = и,- И, 
The form of distribution of the Y measures and 
the variance error of My (= ) are as simple 
N-1 


to determine as the form of distribution and 
mean of any raw measures. This relationship 
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between measures and their logarithms suggests 
the following practice: 

When the geometric mean is used, interpret 
relationships between measures inrelative terms, 
but judée of the confidence to be placed in the 
geometric mean via the confidence to be placed 
in its logarithm. 

Confidence as here used is that due to the 
nature and size of the sample. In many, perhaps 
most, situations wherein the geometric mean is 
appropriate a sample does not exist. If the 
population of a city in 1930 is 152760 and in 
1940 it is 186210, we do not say that we have a 
sample of two. We have two observations in a 
time series and these observations at different 
moments of time are not assumed to be two obser- 
vations of the same thing drawn from an infinite 
population. We may ask, "What is the average 
yearly rate of growth?" The ten-year growth is 
33450. We do not divide this by ten and call 
the yearly growth 3345, but assume a constant 
yearly rate of growth. 


Population in 1930 = 152160 

Assume in 1931 it 152760(1*r) 

Assume in 1932 it 152760(1*r)? 
etc. 


Assume in 1940 it =  152760(1*r)]?- 
186210 


Taking logarithms and solving, г = .0200. 

We find an average increase of 2 per cent a year. 
Had we the following data: Mean vocabulary of 
100 1.0. sixteen-year-olds 15276 words, and of 
110 I.Q. sixteen-year-olds 18621 words, we would 
not have dealt with rates but have said that 
there was an average increase of 334.5 words per 
unit increase in 1.0. We note that which pro- 
cedure is appropriate has depended upon our 


и 
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general knowledge of the situations which are 
considerations outside the numerical values of 
the measures themselves. We may believe this 
approach justified when dealing with statistical 
series which are not similar measures drawn from 
an infinite population. This situation is very 
common when dealing with temporal and geographic 
data. 


The time reversal test is of proven import- 
ance in connection with economic indexes. One 
would anticipate its importance in connection 
with fatigue, learning, growth, and sundry socio- 
logical phenomena of a temporal nature, provided 
measurements from a true zero point are avail- 
able. In economics the test requires that if а 
price (or quantity) index based upon a number of 
commodities shows a certain relationship between 
a first and a second date, the reciprocal rela- 
tionship is to be shcwn between the second and 
the first date. Consider the following data 
covering a stable, 1913, and an inflated, 1918, 
period. 


TABLE VII F 
PRICES AND PRICE RATIOS OF CERTAIN COMMODITIES, 
1913, 1918 
Prices УЕ Ratios 
1913 1918 1913 on 1918 1918 on 1913 

Bacon |.1236 -2612 .47320 2.11327 

Рогк «1486 „2495 „59559 1.67900 

Lard 41101 „2603 +%2297 2.36421 
Arithmetic | .49725* 2.05216* 
price index} . 12743 .25700 -495 84 
Geometric | -492 | 6# 2.031 88# 
price index} . 12646 «25694 -49215 


* These values are the arithmetic means of preceding ratios. 


# These values are the geometric means of preceding ratios. 
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The ratio of the price of bacon in 1913 to 
that in 1918, .47320, is the reciprocal of the 
ratioof the 1918 price to that of 1913, 2.11327, 
so the time reversal test holds for prices of 
single commodities. The arithmetic average of 
the 1913 ratios, .49725, is not the reciprocal 
of the arithmetic average of the 1918 ratios, 
2.05216, so the time reversal test does not hold 
for these mean ratios. Also the value of the 
mean 1913 on 1918 ratios, .49725, does not equal 
the ratio of the means, .49584. 

In the caseof the geometric mean of the price 
ratios, the time reversal test does hold for 
.49216 is the reciprocal of 2.03188. Also the 
ratio of the geometric price indexes, .49216, is 
identical with the geometric mean of the separate 
price ratios, It is left to the student as an 
exercise to prove that this is necessarily so. 

In this illustrative example the three com- 
modities have been weighted equally (each weighted 
1), but clearly if weighted unequally all that 
is necessary is that each commodity be taken as 
many times as the weight, and then, proceeding 
as here shown, we again find the time reversal 
test holding for the geometric means. There are 
other price indexes than the geometric mean for 
which the time reversal test holds, but none 
which are more simple algebraically. 

The ideal distribution of the cases consti- 
tuting a sample in which to use the geometric 
mean is one in which the logarithms of the meas- 
ures aredistributed normally. The data of Table 
VII G yield such a distribution. 

Because of the extreme skewness of these data 
it has been necessary to report frequencies for 
unequally spaced intervals. A plot of these 
data as a frequency polygon will be informative 
in strikingly revealing.the form of distribution 
for which the geometric mean is most appropriate. 
If logarithms of these X measures to the base e 
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TABLE VII С 


DISTRIBUTION OF 1000 MEASURES THE LOGARITHMS OF 
WHICH YIELD A NORMAL DISTRIBUTION 


x шоо pe erui prn 
i 1000 z i 
‚166667 + 
-333 . 166667 2 
-500 . 166667 4 
.667 „0331 .166667 6 
1.0 .0539 5 27 
1.5 .07u5 5 37 
2.0 0848 5 42 
25 .0886 .5 y4 
(e=mode) 

3.0 0884 5 re 
3.5 .0861 .5 43 
4.0 «0825 .5 ul 
4.5 .0782 .5 39 
5.0 .0738 .5 37 
5.5 .0693 .5 35 
6.0 .0653 .5 33 
6.5 .0608 .5 30 
7.0 .0568 .5 28 
7.5 27 
8.0 25 
8.5 23 
9.0 22 
9.5 20 
10-5 54 
12.0 ч 
13.5 ” 37 
15.0 31 
16.5 2 26 
18.0 0149 : 22 
20.0 01218 .5 31 
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ORDINATE INTERVAL FREQUENCY PER 
Х 5 i INTERVAL — 
a 
22.5 .00958 2.5 24 
25.0 .00766 2.5 19 
27.5 .00619 2.5 15 
30.0 „00506 2.5 13 
32.5 .00409 2.5 10 
35.0 -00339 2.5 9 
38.0 -00274 3.5 10 
42.0 «00210 4.5 10 
47.0 .00153 5.5 8 
53.0 .00108 6.5 7 
60.0 «00074 7.5 6 
68.0 „00050 8.5 4 
77.0 .00033 9.5 3 
87.0 .000219 10.5 2 
98.0 +000 144 11.5 2 
112.0 „000088 16.5 2 
133.0 -000046 25.5 ! 
171.0 „000017 50.5 ! 
249.0 .000003 101.5 5, 
450.0 .000000+ 300.5 + 


are found and a distribution made, it will be 
normal, with a mean of 2.0 and a variance of 1.0, 
and these statistics will have small standard 
errors. For the X measures the mean and variance 
would be, of course, very poor statistics. 


SECTION 5. THE HARMONIC MEAN 


By definition the harmonic mean, ЛИ, of a 
series of measures, X, is the reciprocal of the 
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mean of the reciprocals, thus, 


ЦАЙ ЧИЙГ! 


НИ oF ana Harmonic Mean 81] 


All X measures must be positive. Let У = 1/Х. 
To compute the HM we compute the Y values, obtain 
their mean, and take the reciprocal. Having a 
distribution of Y values we can obtain И Уу» 
and V, by procedures already given. Thus, to 
obtain a measure of confidence in the HM, we 
first secure a measure of the reliability of M.. 
For any fiducial limits in HM that we may be 
interested in, there are corresponding limits 
in M, and we judée the trustworthiness of M, via 
methods already given for testing the significance 
of the arithmetic mean. 


The НИ is serviceable in connection with 
"work limit" measures. In the phraseology of 
psychology a test calling for a fixed amount of 
work, and scoredon the basis of time required to 
accomplish it, is called a "work limit" test. It 
stands in distinction to a "time limit" test 
which isone having a fixed time limit and scored 
on the basis of amount done. Let us consider the 
case of individuals A4, B, and C, able to ac- 
complish 30, 40, and 50 units of work, respective- 
ly, in one hour. If tested by means of a half- 
hour time-limit test, their amount-done scores 
will be 15, 20, and 25. If tested by means of a 
60-item work-limit test, their time-required 
scores will be 120, 90, and 72 minutes respective- 
ly,—showing quite different relationships than 
do the time-limit test scores. The time-limit 
data yield an average score of 20 units in30 
minutes, which is at the rate of one unit in 1.5 
minutes. We desire to get the same information 
from the work limit scores, but their mean, 94, 
equivalent to a rate of one unit done in 1.5667 
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minutes, does not give it. For the HM we have 


1 19441 EE 1 
= ( +—+—) = — 
НИ 1339120959012 90 


Thus the НИ time required to perform 60 units of 
work is 90 minutes, which is at the rate of one 
unit in 1.5 minutes, —checking with the time- 
limit test results. 

To determine which mean to employ the student 
must decide for his particular situation which 
mean is the more immediately interpretable,—that 
whose base is constant time (the time-limit test 
score), or whose base is constant product (the 
work-limit test score). Ordinarily, the meaning 
in the reader's mind of a unit of time is both 
precise and comparable from one situation to 
another, but the meaning of a unit of work, even 
when precise, as e.g. , might beoneton of coal, 
one automobile, one ton-mile of transportation, 
is not comparable from situation to situation. 
Thus the HM is suggested as the appropriate aver- 
age to use in a work-limit test in psychology or 
in industry, where а common procedure isto record 
the time it takes a workman to perform a unit of 
work. 

Another obvious consideration is to select 
the average with the smaller standard error. If 
the Y measures are more nearly normally distri- 
buted than the X measures, the НИ will generally 
be more trustworthy than the M. 

It was noted that the СИ price index yields 
consistent results as changes in the basal year 
are made, and that in this respect there is a 
systematic error in the arithmetic mean index. 
There is a systematic error іп the HM price index, 
but it is of the opposite sort to that in the M 
price index. Аз a result, the average (prefer- 
ably geometric) of an M and an HM price index is 


unbiased. 
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PROBLEMS 


Problem 1. Prove that the G.M. of ratios of 
measures in two paired series is equal to ratio 
of the G.M.'s of the two series. Suggested nota- 
tion follows: 


Let ру, = price of commodity a at date 1, and 
similarly for Рве Pin? P2.» Dop^** Pop» D3,: 
P3y+++ P3n» etc. Then Рї./Рза = а price ratio of 
commodity a at date 1 to 2. Any average of these 
for several commodities, which we will call T. 
is a price index. Problem: to find such an 


12 
average that —= I,,. This asserts, for ex- 
32 
ample, that if the price indexes using 1920 as 
the basal year (9206195202 Туура, 1920»... 
11556,1920) are available, the price indexes 
for any other basal year, say 1930, are available 


: 5 _ 910, 1920 _ 91,1920 
т =, 19311,1930 “үг, etc. 
1930,1920 1930, 1920 


Prove that the simple G.M. of price ratios is such 
ап average. Also prove that the arithmetic mean 
Of price ratios is not. 


СНАРТЕВ УТТТ 
THE NORMAL DISTRIBUTION 


SECTION 1. THE NORMAL DISTRIBUTION AS 
DESCRIPTIVE OF CHANCE DISTRIBUTIONS 
AND DISTRIBUTIONS OF ERRORS 


Phenomena of great diversity have suggested 
the normal distribution to the mind of man. It 
inevitably came to the attention of analytical 
gamesters interested in the probabilityof events 
when coins were tossed, dice thrown, cards drawn 
from a deck, and roulette wheels spun. An im- 
portant relationship inherent in all of these 
cases is that, starting with distributions of raw 
data which are rectangular, or point rectangular, 
one approaches normal distributions of statistics 
which are aggregates oraverages. If an unbiased 
die is thrown many, say N, times we approach the 
following distributions: 


Number of pips А УЖАСЫ 
Number of Ne NT Мы N 
occurrences 219 ЁО E TE 6 26 


which we call a point rectangular distribution. 
This certainly has no semblance to a normal dis- 
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tribution. Consider a still simpler case: the 
number of heads showing when an unbiased coin is 
tossed N times the distribution will approach the 
following: 


Number of heads 0 
N 


Number of occurrences 2 


SE T 


If we throw 20 coins at a time and compute for 
each throw the mean number of heads, we will 
approach the binomial distribution given by the 


Successive terms in the expansion of (.5 + ,5)?? 
which is very nearly normal. Though the distri- 
bution of single items is two point rectangular, 
the distribution of means of such tends toward 
normality. The writer knows of no exception to 
the proposition that the distribution of sta- 
tistics which are means or aggregates of original 
measures, no matter their form, tends toward 
normality if the number of cases entering into 
the mean or aggregate is large. 

The normal distribution 15 а limiting form and 
the number of paths whereby to approach it are 
infinite. A. DeMoivre (1733) is to be credited 
with being the first mathematician to derive the 
normal curve and compute probabilities based 
thereon. 

At an early date astronomers found that the 
distribution of their angular observations of a 
star, which could only be considered to be errors 
of observation, approached normality as the 
number of observations increased. We have here 
a phenomenon that is normally distributed, though 
it is not an average or aggregate, In general 
we may anticipate that when a fixed point is 
aimed at, the distribution of errors resulting 
will either be normally distributed or that some 
simple transformation of these measures will be 
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so distributed. To cite two cases involving 
transformed values: If lifted weights are com- 
pared with a standard weight the distribution 
of errors, expressed in grams, will not be as 
closely normal as will the distribution of errors 
if the logarithms of weights have been employed. 
Again, if teachers attempt to classify pupils 
according to mental age, attempting, say, to put 
into a single group all those of a certain men- 
tality, the distribution of their errors expressed 
in terms of mental age will not be as closely 
normal as will the distribution expressed in 
sensed difference units. The writer offers 
the hypothesis that all errors of observation 
are normally distributed when expressed in units 
which are intrinsically appropriate to the sense 
organ or faculty making the observation. 


SECTION 2. THE NORMAL DISTRIBUTION AS DESCRIPTIVE 
OF BIOLOGICAL AND SOCIAL PHENOMENA 


Abraham DeMoivre, the discoverer in 1733 of 
the normal curve as a limiting form to the bi- 
nomial (a*b)^, also conceived of it in the 
broadest manner for deviations from it were no 
less than fluctuations from the original design 
of the Deity. It wasnot only descriptive of the 
universe of physics, but also of biology, soci- 
ology, philosophy, and religion, DeMoivre’s 
successors, scarcely held this breadth of view, 
though Laplace later in the same century and 
Gauss, beginning with his Theoria motus corporum 
coelestium in 1809, both greatly extended the 
concept in connection with mathematics and the 
physical universe. Still later the astronomer 
Quetelet turned his attention to human affairs, 
writing, in 1846, Lettres . . . sur la théorie 
des probabilites, appliquée aux sciences morales 
et politique, and in 1871 upon Anthropométrie ou 
Mesure des Différentes Facultés de l'Homme. 
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Herewith is a striking table prepared by 
Quetelet: 


TABLE VIII A 


QUETELET'S TABLE OF HEIGHTS OF 
WAR SOLDIERS 


NOMBRE PROPORTION DIFFERENCES. 
MESURES des recenses, jde la hauteur entre 
de pr difference | de 1000 in- les valeurs 
HAUTEUR METRIQUE* de hauteur, | scrits mesures.| observees 
de 0m0255. OBS. CALCUL. et calculees 
1,397 а 1,524 . 
а 
те: 9 -7 
19600 202 9 aa 21 =| 
156267501 О 1237 48 42 +6 
155814. tale ets, 1947 75 72 +3 
бб aie oa э 3019 117 107 +10 
Я та 3475 134 137 -3 
157275. at acne .. 4054 157 153 + 
(TEE О 3631 140 146 -6 
INCID LPS 413 3133 121 121 0 
1 808452: 128572: 2075 80 86 -6 
А ИРЕ 1485 57 53 ++ 
18951424 55/ rental 680 26 28 -2 
Гаво ТЕ 343 13 13 0 
15905757 47:72:74 118 5 5 0 
1,930. 6... e 42 2 2 0 
1, 956[and + 
-28 
+28 


* INTERNATIONALER STATISTISCHER CONGRESS |N BERLIN, t. 11, 
page 748, 


Could any student, if the first to make the 
tabulations shown in Table VIII À, fail to be 
Struck with wonder at the evidenceof a permeating 
design? Quetelet wrote of this and other data 
(1871): 
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"It is especially noteworthy that human height, 
though apparently having been developed ina for- 
tuitous manner, is nevertheless subject to a very 
exact law; and this is not peculiar to height, 
but is observable also in connection with weight, 
strength and speed of man, and further in his 
intellectualy and moral qualities. We see one of 
the most wonderful laws of creation in this or- 
derly diversity which is accomplished entirely 
without the intervention of the human will." 
This was both a return to the breadth of view of 
DeMoivre and an immediate stimulus to Francis 
Galton whose studies of hereditary genius became 
enriched with the concepts of normal distribution 
and correlation. 

In Galton's Natural Inheritance (1889) is 
found a Table entitled "Data for Schemes of Dis- 
tribution of Various Qualities and Faculties 
among the Persons Measured at the Anthropometric 
Laboratory of the International Exhibition of 
1884." Galton used the median as the measure of 
central tendency and Q', a smoothed estimate of 
the quartile deviation, as the measure of varia- 
bility. Using these and the normal distribution 
he derived measures as shown in Table VIII B. 

With the more refined present-day methods of 
testing goodness of fit the statistician could 
show that same of the traits of Table VIII B are 
not normally distributed, but nevertheless the 
deviation is not great and the major impression 
gotten by Galton, that the traits are normally 
distributed, was a great advance over that of 
his contemporaries who, except for a few, must 
have held either no viewora decidedly erroneous 
one. 

The nature of the distribution of mental 
traits is, of course a function of (a) the meth- 
od of selectionof the group examined, (b) the u- 
nits of measurement employed, and (c) the preci- 
sion, or lack of chance factors, in the measure 


08€ 


TABLE VIII B 
DEVIATIONS FROM THE MEDIAN IN TERMS OF 0” 


Susvect oF 
MEASUREMENT 
Ht. standing 
without shoes 


Ht. sitting 


Span of arms 


NOILOüSTHISIQ TYWYON 


Wt. in ordinary 
indoor clothes 


Breathing 
capacity 


[18:2] > 


Strength of 
pull as 
archer with 
bow 


Grip, strong- 7.75 
est hand 7.50 


FT.PER 
SEC. 
Swiftness of 2.37 


blow 1.55 


Keenness of INCHES 
vision dis- 4.00 
tance of 5:22 
reading test 

type 


SUMS . E ао е о Кае а ее а 6 


Means multiplied Бу 1.015 to таке О = I 


Normal values When. . . . . . . .О = I 


YVNAWONAHA 40 ЯАТ14ТЧО5Я@ =[10:8] 


185 
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employed. If there are large chance errors we 
should anticipate that, like errors of observa- 
tion, they would tend to be distributed normally. 
Thorndike and Bregman, (1924), after making ade- 
quate allowance for the normalizing influence of 
chance factors, concluded "that intellect in the 
ninth grade, if measured in truly equal units, is 


distributed approximately in [thenormal manner]. " 


This finding pertained to "word tests," "space 
tests," and a more general test involving "se- 
lective and relational thinking, and generaliza- 
tion and organization" separately, and also to a 
composite based upon nine mental test measures 
given to some 14,000 widely distributed ninth- 
grade pupils. Their composite distribution is 
shown in Chart VIII I. 


CHART VIII I 
COMPOSITE CURVE FOR THE NINTH GRADE 


BASED UPON NINE SINGLE CURVES. THE BROKEN LINE 
INDICATES THE THEORETICAL NORMAL CURVE. 


—— рү о шау" ———omtt SN 


MEC Co 
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In all these cases of normality mentioned, 
and in innumerable other cases which could be 
cited, it is obvious that there has been a se- 
lective and environmental pressure operating. 
Geneticists have commonly secured point, or near 
point, distributions, but their plant and animal 
products are favored and not subject to the laws 
of survival of the plant and animal in the wild 
state. Many,—probably most and possibly all,— 
of the mutative point characters of the wild-type 
fruit fly are noncontributive to survival in the 
wild state. Institutional care of the feeble- 
minded fruit fly, the eyeless one, the wingless 
one, etc., permits survival. In the wild type 
such essential ‘characters as wing-spread, weight, 
etc., as condition survival betray unimodal sym- 
metrical or slightly skewed distributions. The 
experimental facts of genetics and medicine and 
the observable facts of stable plant, animal, and 
human life suggest that elementary causes (genes, 
types of infection, specific foods, vitamins, etc.) 
which, if they could be singly operative, would 
lead to point phenomena, do in fact so combine 
with innumerable other elementary or complex 
causes as to yield adjustment and survival traits 
which are of the unimodal symmetrical or slightly 
skewed types. А notable exception to this is 
sex, which characteristically yields a dromedary- 
back distribution, one hump for male and the 
other for female. Except for certain mental as- 
sociates of sex and certain non-sex linked char- 
acteristics which are unimportant for survival, 
such as taste sensitivity, psychological inves- 
tigation and experimentation have notas yet es- 
tablished the existence of mental traits of the 
dromedary type, though innumerable researches 
have been such as to permit the betrayal of bi- 
or multi-modality, if present. The characteris- 
tic distributions of sociology and economics are 
unimodal, as also are most, but not all, of those 
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of physiology and medicine. е cannot conclude 
that normality of distribution is universal in 
biological and social phenomena, though we may 
confidently expect to find distributions which 
do not differ greatly from it. 


SECTION 3, THE NORMAL DISTRIBUTION AS RELATED 
TO ADVANCED STATISTICAL THEORY 


Every experimental situation reduces to that 
of the significance of some difference. There 
are two aspects of significance, (а) the magni- 
tude and (b) the trustworthiness of the differ- 
ence. Depending upon the problem, the practical 
issue is (a) or (b) or an issue involving both. 
To interpret magnitude some form of parent popu- 
lation must be postulated, andin many situations 
this is the normal distribution. To judge of 
Significance some law governing the errors of 
observation and of measurement must be postu- 
lated, and for cogent reasons this is taken to 
be a normal distribution of errors in large sam- 
ples and the appropriate derivatives therefrom 
for small samples. The t-distribution, shown 
through the courtesy of Dr. Phillip J. Rulon, in 
Chart VII II, is appropriate for interpreting the 


CHART VIII II 


The t-Curve, y= TF 
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CHART VIII III 


CHI- SQUARE. DISTRIB 
FOR VARIOUS DEGREES OF FREEI ВЕТ ep 
Е аар | | їе 
Ёс | aaie 

A Чоы2 док” 
Zio х 
Зэ 
э 
g? “Оо, 12| 1001 8555 1100 | 0205 
a7 } 
бь Set 4 - 
@ lof 
55 23 
Я. А dor. 
ы 3 dof 

2 
E BERE 
& | 

00 

50 100 1.50 200 
dof 


520181 Хү, Хәр + X, be uncorrelated, but associated, variables 
(ї.е., Tyo, ete. = 0), each having meanofzero, a standard devi- 
ation оў опе, and normally distributed, then Curve d.o.f. = 1 is 


the distribution of MES For this case the degrees of freedom or 


number of dimensions In which variability takes place = 1 and 
the distribution of X; Is normal. Curve d.o.f, = 2 is the dis- 


tribution of (x + X21/2. Curve 4.0.1. = 3 Is the distributlon 
of сё + +җ)/3. Etc. The curve for 30 or more degrees of 


freedom is not, on the scale used, visually distinguishable from 


a normal curve. 
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significance of means, differences of means, and 
of regression coefficients, for small samples,— 
say N less than 15. It is the distribution of 
these statistics computed from small samples drawn 
from a parent normal distribution. The J^ distri- 
bution, shown in Chart VIII III, is that of the 
squares of independent variables, —all under the 
assumption that these variables separately are 
normally distributed. The highly serviceable 
distribution of the ratiooftwo variances is an- 
other derivative of normally distributed meas- 
ures. 

Thus, in practical statistics, the normal 
distribution directly servesmany situations and, 
by being the parent from which other distribu- 
tions are derived, indirectly serves the field 
of contingency, analysis of variance, small sam- 
ple theory, and still many other fields of ad- 
vanced statistics. 


SECTION 4. THE NORMAL DISTRIBUTION AS RELATED 
TO THE DESIGN OF EXPERIMENTS 


In general, when an experiment is drawn up, 
simplicity and directness of treatment and inter- 
pretation will be served if 


(a) the crucial variable can be analyzed into 
independent rather than correlated parts; 


(b) where correlations exist, regressions, as 
explained in Chapter XI, are linear, not 
curvilinear; 


(c) distributions are either rectangular or 
normal, not asymmetrical, bimodal, or of 
unusual kurtosis; 


(d) residual errors are normally distributed, 
or, as in the case of t, X°, and Е, the 
variance ratio, distributed as some func- 
tion of a normal distribution. 
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Not infrequently а mathematical transforma- 
tion of an initially non-normal variable into 
one normally distributed will, at the same time, 
make (a) possible and bring about (b). Thus in 
all four connections the normal distribution 
plays an important and frequently crucial role. 


SECTION 5. DERIVATION AND TABLES OF THE NORMAL CURVE 


There are innumerable ways of deriving the 
equation of the normal curve. Since all of these 
reveal it to be the limit of some process, the 
mathematical concepts of infinite series, limits, 
and of the calculus seem necessary to a complete 
understanding of its derivation. It has fre- 
quently been shown that the successive terms of 
the binomial (ptq)" yield the distribution of 
events when random samplings are drawn and also 
when errors of observation or of measurement are 
of a chance nature. It has also frequently been 
shown that this distribution, when N becomes 
large, approaches a normal distribution. 

The accompanying derivation, which is that of 
the simplest of the Pearson system of curves, 
nicely illustrates the mathematical simplicity 
of the normal curve. 

If, at any point in a continuous curve y=f (x) 
a tangent to the curve is drawn, the slope, 
or tangent of the angle which it makes with the 
x axis, is approximately 8y/8x,—the ratio of 
the small change taking place iny attendant upon 
a small change in x. The slope is exactly this 
ratio as the change in x, viz., 5x, is made in- 
finitely small. The calculus notation is 

ee 

slope = = 
We will observe that the slope properties of a 
symmetrical, unimodal curve, origin at the mean, 
whose tails approach ever more closely to the x 
axis, are that the slope equals zero when x - 0 
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and also when y = 0 (which y does when x = œ, or 
- œ). Algebraically, the simplest slope equa- 
tion, called a differential equation, having 
these properties is 

dy 


21 - cxy [8:01] 


in which c is some positive constant.  Corres- 
ponding to this equation expressing the slope 
properties is another equation, its integral, 
giving the relationship between y and x. This 
equation is 


EU. oe 
in which k is some constant and e is 2.71828..., 
the Naperian base of logarithms. If we now 
choose c so that the standard deviation of the 
distribution is equal to 1.00, and so choose k 
that the total area under the curve is equal to 
1.00, and substitute z for y merely to avoid 
certain ambiguities in notation which would oth- 


erwise arise, we have 
Equation of 
l =x unit normal 
2441-2465 distribution [8:02] 
У2т 
The area to the left of any point xis designated 
p and is given by the integral of [8:02]. Thus 


2 


M ЭХ Normal probability 
£. = of a measure being 
р =] аже | 228 2 ах legs than 3 [8:03] 


-9 ym. 


Of course а (= 1 - p) is the area to the right of 
x and is the probability of a measure being 
greater than x. Since the curve is symmetrical 
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it is only necessary to table x and z values for 
p between .5 and 1.0, ог а between .5 and 0.0, 
as has been done in Table XV C. For x and 2 
corresponding to a p value less than .5 one en- 
ters the table with the complement of p, viz., 
with q, and finds the value of z and, after at- 
taching a minus sign, of x. 

The two most common requirements of practical 
statistics are for values of x and z, knowing p, 
and for values of 2 and p, knowing x. Kelley’s 
extensive Table of Normal Probability Functions 
(1947) gives directly x and z to eight decimal 
places for ап argument of p of four decimal places. 
Entering Kelley's table with x yields p and z, 
but notas conveniently as, e.g., does Sheppard's 
Table (Pearson, 1914) in which the argument is x, 
to two decimal places, andthe tabled entries are p 
[called in his notation .5(1 + а )] and z to seven 
decimal places. 

Of the many other tables of the normal curve 
the early and truly colossal table, considering 
the mechanical aids available at the time, of 
James Burgess (1897-98) and the later extensive 
tables of the National Bureau of Standards (1941 
and 1942) are especially notable. 

If we modify [8:02] so that the standard dev- 
iation is c instead of 1.0 and the area М in- 
stead of 1.0 and call the ordinate y, the equa- 


tion is 


=a : ht 
“ ОТ The normal equation with origin 
N 2c at the mean, number of cases = А 
y= е №, апо standard deviation of c [8:04] 


с ут 


If X is a raw score and М the mean, then x-X-M 
and we may substitute X-M for x in [8:04] and 
obtain the equation expressed in terms of raw 
scores. 
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SECTION 6. CERTAIN PROPERTIES OF THE NORMAL 
DISTRIBUTLON 


The exponential nature of the normal equation 
would seem, to the uninitiated, to suggest that 
its manipulation would be somewhat involved, but 
this is not the case because the intrinsic rela- 
tionships maintaining in the curve are extremely 
simple, being equaled in simplicity only by those 
inherent in the rectangular and in the Poisson 
distributions, and because very adequate tables 
of the normal probability functions are available 
to the statistician. Just as logarithms became 
a daily device when adequate tables were pub- 
lished, so the normal curve has become such a 
tool because of the tabling of its functions. 

We let u, be the n'th moment from the raw 
Score origin and м, the n'th moment from the 
mean. The successive moments of the normal dis- 
tribution [8:04] are as follows: 


© 
ш= f y dx = N, the number of cases [8:05] 
-0 
о 


M ; : 1 
Sometimes ир is defined аз — / у dx in which 
-0 


case it of course = 1.0. 


ш = M, the mean. 


1 ю 
№ ЗЭВ y x dx = 0; and every other odd moment 
T Ри о. >. [8:06] 


P 25553 2 
The mean deviation =— / y x dx = /2 с 
Nig п 


Сее иу 0. [8:07] 


1 о 
nm e Vox dx coUe ОКА . . [8:08] 
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Mean ре 21 ух! dx = Ё? T Les [8:09] 
N т 


Mean [к= = оо 


Every even moment is derivable from V and the 
even moment, which is two less than the original 
moment, through the relationship 


u, = (п-1) V роо, wherein n is even [8:12] 
so that 
по Т5 Оооо от 
тэс 
И есе = .. [8:15] 


45 


0, Normal skewness (Pearson) [8:16] 


Буз 


8, = My = 35 Normal kurtosis (Pearson) [8:11] 


Улсан n L2 = 0, Norma! Skewness [8:18] 


(Fisher! 
Уу is a theoretical value and Fisher's sample 
estimate of it is called &;. 
x 
Hy A measure of normal kurtosis. 
Yo = -3=0, (Fisher) [8:19] 
а 
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У. is a theoretical value and Fisher's sample 
estimate of it is called Bos 
The equations just given for щ, шу, шу, and 
Hp, hold for all distributions, those for the odd 
moments for all symmetrical distributions, and 
that for u, for all mesokurtic distributions. 
The normal equation is that particular symmetric 
mesokurtic distribution for which the reduction 
equation [8:12] holds. 
Ап approximate plotting of a normal curve can 
readily be made from the values at five points, 
as follows: 


x  -2.50 -1.00 .00 1.00 2.50 
z .0175 ‚24 .40 ‚24-0175 


Ratio of T 3 3 1 
Ztozat ЭБ = О y 
the mean 


When plotting, the Slope is, of course, to be 
made zero at x = 0, and the points of inflexion 
of the curve are at -1 (1.е., -1 с) and 1 (i.e., 
ЕЛ9). 

The relationships between the standard devia- 
tion and other measures of variability are, in 
the normal distribution, as given herewith: 


2 
A.D. E = 19188456 о... [8:20] 
а не 0. [8:21] 
Pv =Р.5;-Р.о1= 2.95158206 с . . [8:22] 


The very slightly more precise normal optimal 
interpercentile range is 


Pogas. Роба ^ ЗОО, о, 2 [8:228] 


The relative merit of these measures of vari- 
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ability, for samples drawn from normal popula- 
tions, is shown in Table VI G. 

The most common use to which the normal dis- 
tribution is put is in finding the probability 
of a measure exceeding a certain value. This is 
given by the proportionof cases of greater value. 
It is necessary to express the certain value in 
which interested as a standard score. If the 
value is x, we compute the standard score 

X-M A standard score (also called 


murem tastes qq ame] ISTA 


x 
o 


Table VIII III gives q, the proportion of cases 
greater than x for a few selected values of x. 


TABLE VIII C 
NORMAL PROBABILITIES 


2q, -PROBABILITY OF EX- 
CEEDING X OR FALLING 


x 9, -PROBABILITY 
STANDARD SCORE 


OF EXCEEDING X SNORT OF X 

.0 .50 1.0 
«67449 «25 .50 

|. +1587 +3173 
1.64485 «05 «10 
1.95996 „025 „05 

2. +02275 0455 
2.32635 .01 «02 
2.57583 «005 «001 
3. «00135 «00270 
3.09023 «001 «002 
3.71902 «0001 «0002 
3.89059 «00005 «0001 


Since errors of observation are, in general, 
distributed in a nearly normal manner, an error 
in excess, in absolute amount, of .67449c is as 
likely to occur as an error less, in absolute a- 
mount, than .67449с. For this reason, since its 
first use by Bessel and Gauss in 1815 and 1816, 
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the magnitude .67449c has been called а "probable 
error." 


БОЕ 6144898 о О [8:24] 


The precision of a statistic may be indicated by 
attaching the probable error to it thus: 14.7 + 
1.2. Ап alternative procedure is to give its 
standard deviation, calleda standard error when 
the deviation in question is conceived of as ап 
Error 1: АТ аб ег. = 1.8)... Since this 
Second procedure is arithmetically simpler, as 
it does not involve multiplication by .67449, and 
is more accurate, as it does not by implication 
assume a normal distribution of errors, it will 
be used in this text. The designation of the 
standard error by the + sign, though found, is 
definitely bad practice, in that this sign has 
earlier and systematically been employed in con- 
nection with the probable error. 

The probable error and the quartile deviation 
have been confused. 0 = (0,-0,)/2, one-half the 
difference between the first and the third quar- 
tiles in any distribution, and P.E.- .67449с in 
any distribution. 0 and Р.Е. happen to be equal 
in the case of the normal distribution, but in 
general they are not equal and should not be 
used interchangeably. 

We may note three degrees of refinement in 
judging of the precision of a statistic: (a) 
reportingitsP.E. and thus by implication assum- 
ing a normal distribution of errors; (b) report- 
ing its standard error, thus not stipulating the 
formof distribution of errors, and (c) reporting 
the presumptive form of distribution of errors 
(perhaps a f-distribution, a chi-square distri- 
bution, or a variance ratio distribution). The 
third procedure is highly meritorious, and es- 
pecially so when small samples (or small number 
of degrees of freedom) are involved, but most 
problems of significance of a statistic are 
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answered by knowledge of its standard error, so 
the student should become thoroughly familiar 
with the computation of the standard deviations 
of all sorts of statistics. 

The ratioofa statistic to its standard error 
is called a critical ratio. 

In medical and chemical situations it is con- 
ceivable that some virus, bacterium, or other 
substance, may have anaffect that is not propor- 
tional to the frequency of the substance, but to 
some power of the frequency. This suggests that 
we study the distribution of y', where r is some 
positive integral or fractional magnitude and y 
is the ordinate in a normal frequency curve. 


RUN. x2 
LETTO bd uen 297 
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which is a normal distribution with variance 
.5 о? and number of cases №/@ с Ут). 


N 5 ES 2 
r= — 8:24 
2 (Е Уп SED 2o? [ а] 
This also is а normal distribution. 


SECTION 7. STATISTICAL CONSTANTS DESCRIPTIVE 
OF PORTIONS OF A NORMAL DISTRIBUTION 


CHART VIII IV 


x: x 
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Chart VIII IV represents a unit normal dis- 
tribution. The magnitude x is a standard score, 
the height of the curve at this point x is z, and 
the proportion of the total area above this point 
is q. The mean of the measures above x. is d. 
and is approximately the distance labeled d,. 
The mean of the tail above x, is d, and the mean 
of the portion between x, and X; is d;;. Knowing 
Ч, the magnitudes x andz are gotten from a table 
of normal probability functions. А brief table 
of these functions is given in Table XV C. 

A formula giving d, the mean deviation of a 
tail portion, is readily derived by means of the 
calculus 


2 The mean devia- 
= tion of a tail 3 
xz dx portion of a unit 18:25] 
q normal distribution 


1 
d == 
q 


х —8 


If the tail is greater than one-half the curve, 
the formula becomes 


diera COO IY ЕО (8:26] 


The sign of d is positive if the tail is toward 
the right and negative if toward the left, when 
the x scale is from left to right. 

Having a d, a distance in a unit normal dis- 
tribution, the corresponding deviation from the 
mean if the standard deviationisc is simply dc. 

Let 9, be the proportion to the right of x,, 
q, the proportion to the right of x,, а; } the 
proportion in the interval from x, to x., i.e., 
ЧИ = (а-чу). Since the mean of a sum is the 


average of the means of the parts, each weighted 
according to the number of cases, we have 


d,q, = 4,,(4:г4,) + djq;, which with [8:25] 
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yields 
T 7; 7177; тһе mean deviation of 
d,, = = a portion of a unit [8:27] 
1] ü : 
qi Ч) dij normal distribution 


As employed in this text q is generally a 
proportion less than .5, but in [8:27] it is the 
proportion to the right of the point of dichotomy 
and may be less or greater than .5. When so em- 
ployed the sign of di; is correctly given by the 
sign of (z,-z,). 

Table VIII D herewith illustrates the trans- 
formation of an ordered variable, — teachers’ 
marks А, B, C, D, E, F,—into quantitative 
values, under the assumption that the talent re- 
presented by the ordered data is normally dis- 
tributed. The mark А corresponds to a position 
1.692 standard deviations above the mean, etc. 


TABLE VIII D 


NORMALIZING AN ORDERED VARIABLE 
FROM TABLE XV C 


PERCENTAGE 
OF PUPILS 
RECEIVING 

MARK 
INDICATED 


MARKS 


. 000000 
. 192900 . 588 
«397034 | — .325 
„291399 |- .989 
. 190478 |-1.533 
«052485 |-2.386 


mmood > 


It is frequently desirable to have further 
statistics of portions of a normal distribution. 
We herewith give the calculus statement for ob- 
taining the variance, the third moment and the 
fourth moment of a portion bounded by x, and x,. 
The calculus formulas are given to illustrate the 
derivation, but are not needed in the practical 
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utilization of the method. 
The mean of the squared deviations of the 91; 


measures in the interval x, to х}, from the mean 
of the entire distribution is Vij , and from their 


own mean, d;;, it is as given by [8:29] 


ij 1) 1) 
с] 
== one dc E 
qi: 1 Меап squared deviation, 
1 : from total distribution [9.99] 
mean, af a portion of a 5 
unit normal distribution 
х,2 X.Z. 
т 
= 1+ JM, 
Accordingly MU IAS Variance of 
S Jus) 2 а portion of 
Vi oe hea = di; a unit normal [8:29] 
21; distribution 


By very similar procedures we obtain the third 
and fourth moments of portions: 


j 
Mean cubed 
ЇЇ x? z dx deviation, 
1 from total 
distribution 
mean, of a [8:30] 
2 РЕ 60 portion of а 
XZ xj 2) unit normal 
- + 9 42. distribution 


* Having D, Q, X, and Z as defined in connection with a unit 
normal distribution, we have, 

ар = z dx [8:282], and 

dz - -xz dx [8:28b]. 


The various integrals here needed are then obtained by inte- 
grating by parts and evaluating at the limits, 
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= ш. ЗИ dy + 2d). 
Нэгт 5 Party ЫТ |] third moment 
of a portion 


of a unit [8:31] 


xiz c xi normal distribution 
RL———(2-3,)d,-d, 
ЫЧ ij 
d; 
1 001 
П = 4 
ц: ту СУ ах 
3j : Mean fourth powered 


deviation, from total 
distribution mean, | 
of a portion of [8:32] 
3 a unit normal distri- 
- X. bution 


As 
и 
E 
1 
> 
15: 
we 
Q 
+ 
= 
4 
Q 
Er 
co 
а. 
+ 


++ ба ваза 


Fourth moment of а portion of а unit 
normal distribution [8:33] 
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SECTION 8. THE SIGNIFICANCE OF DIFFERENCES BETWEEN 
THE MEANS OF TAIL PORTIONS OF A NORMAL DISTRIBUTION 


Let us be given a sample of V cases upon each 
of which has been made a measurement, X, with, so 
far as ме сап tell, the same precision throughout, 
зо that the standard error of measurement, s, may 
be taken to be constant for all measures. The 
variance, due to the error inherent in the tech- 
nique of measurement, of the mean of any sub- 
sampleofW, cases is $ САР and is not a function 
of the distributionof these N, cases, nor of the 


distribution of the W cases. 
Let the mean of an upper tail portion of N, 


cases be D, and the mean of a lower non- Eius 

tail ре оЇ N; cases be D;. The difference 

between the means i (D,-D;) and the variance er- 
8 

гог of this difference is pe fy 2 The criti- 
i j 

-са1 ratio of the difference divided by its stan- 


l 
dard error is ор, where the error 


in question is that due to measurement and not 
due to sampling. 

If the sample is normally distributed, we 
readily can find the size of the tail portions 
which make this critical ratio a maximum. It is 
axiomatic that the upper and lower portions will 
be equal. Let the X's be expressed in terms of 
standard scores. Let x, be the point of dichot- 
omy for the upper tail, q,N the number of cases 
in the tail, and d; as given by [8:26] the mean 
of the tail. For the lower tail the point of 
dichotomy is -X;, the number of cases q,N, and 
the mean -4,. The critical ratio of difference 
between means divided by standard error is 


О Эг —————— CREE 
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S q,N 


in which № and s are constants. An examination 
of a table of the normal probability functions z 
and q enables us to find the valueofq for which 
this function is a maximum. Тї is the value for 
which q = z/(2x), 1.e., q = .2702678. 

Thus when upper and lower groups are selected 
in order to bring out most decisively the dif- 
ference between means of the upper and lower, the 
two groups should consist of 27 per cent each, 
if the distribution for the sample entire is 
normal. * 


SECTION 9. FITTING A NORMAL CURVE TO DATA 


The equation of the fitted curve is 


N 
Е el ages LOS Ad] 
c 


in which z is the ordinate in a table of the unit 
normal distribution corresponding to the deviate 
х[ = (X-M)/c]. It is frequently best to plot 
this curve for values of X which are class in- 
dexes, because then a precise comparison can be 
made between the computed curve and the actual 
data. The value of y given by [8:34] is the 


* А fuller proof and further aspects of this matter have been 
discussed by Truman L. Kelley. "The selection of upper and 
lower groups for the validation of test Items," J. ED. PSYCH., 
Vol. 30, no. 1, Jan. 1939. 
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frequency per unit interval. If the data have 
been grouped, using an interval i, the frequency 
called for is 


Ly GIN са. [8:35] 


We will illustrate the steps in the computa- 
tion, using the New York daily maximum tempera- 
ture data of Chapter IV, Table IV B. The cal- 
culation of the mean and standard deviation, 
using a grouping interval of 3°, has already been 
made in Chapter VI, Section 6. The M = 81.7742 
and c = 6.2358. In Table VIII E herewith we 
give the class indexes, the equivalent standard 
Scores, X;, for the class indexes, the corres- 
ponding or equivalent ordinates, 2,, in the unit 
normal distribution, these multiplied by Ni/c 
providing the theoretical frequencies, I in 
the classes, which are to be compared with the 
actual frequencies, f,, as shown. Though a 
grouping of a number 22102 ое fre. 
quency classes is called for, the actual grouping 
here employed for the upper and lower tail por- 
tions is slightly coarser than is desirable, as 
explained in Chapter IX, Section 5. 


X;-M X -81.7742 


= = .16036 X, - 13.1137 
Ит: 6.2358 | 
N 2) Ni 2) 62X 
Foe ul 225 29.8978 
j 3 6.2358 2 


The difference between f; and Ч is the cell di- 
vergence; CEE D/E is the cell square contin- 
gency; and the sum of these for all cells is the 
square contingency, 12. Chi-square is the sta- 
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tistic which provides a test of the goodness-of- 
fit of the observed data to the theoretical curve. 


TABLE VIII E 


COMPUTATION AND FREQUENCIES FOR NORMAL CURVE 
FITTED TO MAXIMUM TEMPERATURE DATA OF TABLE IV G 


сов f -FY 
a ў Е. 
60 -3.49181 .000898 
63 -3.01071 |.00429 
66 -2.52962 |. 01627 5.59 4 .45 
69 -2.04853 «04894 1.46 
72 -1.56743 «11679 | 3.48 
75 -1.08634 «22113 6.60 6 .05 
78 - .60525 „33217 9.9! 5 2.43 
81 - 212415 .39587 11.8! 23 10.60 
84 .35694 |.37432 11.17 13 «30 
87 ‚83803 „2808! 8.38 6 ‚68 
90 1.31913 «16713 4.99 1 3.19 
93 1.80022 |.07892 | 2.35 
96 2.28131 .02957 «88 
99 2.76240 . 00879 .26 | 3.56 Ч «05 
102 3.24350 . 00207 .06 
105 3.72455 |.000388 | .01 
62.01 62 17.75 
12 
X? =17.75; degrees of freedom = 5; т 23:55 
ЛОЗЕ? 
Е = and == Р = .0034 


The frequencies of the observed distribution 
n the successive classes are recorded in the 

column, and the theoretical frequencies in 
E i column. 

This is not the chapter in which fully to 
discuss J^, degrees of freedom, and the goodness- 
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of-fit test, but it will be noted that since the 
test involved eight class frequencies and since 
the theory accepted three constants determined 
from the data, namely, N, the mean, and the stan- 
dard deviation,* there are eight minus three, or 
five, degrees of freedom. Thus (12/5)/1 is P, 

a variance ratio with five degrees of freedom in 
the numerator variance and an infinite number in 
the denominator. Ey methods given in Chapter IX, 
an equivalent P, the probability that a diver- 
gence as great as that observed may be attribu- 
ted to chance, is found and this constitutes the 
final evidence as to goodness-of-fit. 

It would commonly be stated that the theory 

‚ being tested is whether the deviations from nor- 
mality present in the observed data are of such 
magnitude as to be attributed to chance, if the ob- 

' served 62 items were randomly drawn items from a 
normal parent population. This is not quite ac- 
curate, for actually the theory being tested con- 
cerns itself with that infinite number of sub- 
samples of 62 each of a parent population for 
which the mean is 81.7742 and the standard devi- 
ation is 6.2358. These sub-samples do themselves 
constitute an infinite population and the theory 
being tested is whether the observed sample is a 
chance deviate from such sub-samples. For pre- 
sent purposes the distinction is important be- 
cause it accounts for the loss of three degrees 
of freedom. 

For the data in hand, with P = .0034, we see 
that there is less than 1 chance in 100 that the 
theory of normality is tenable. Knowing nothing 
about the source of the data we would draw this 
conclusion, but knowing the source and knowing 
that successive daily maximum temperatures at a 
single spot are correlated and not independent, 


ж V = (Xfx?)/N is a linear function of the 12585 


и 
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we have a ready explanation in that our items 
are not independent random samplings. Prior to 
the study we would, in fact, have cogent grounds 
for expecting the distribution to be non-normal, 
even though we do not know just what form to ex- 
pect. 


SECTION 10. THE DISTRIBUTION OF THE SUM OF 
NORMAL VARIABLES 


Let x;, x, each be deviations from means, 
normally distributed, and independent of each 
other, with variances V, and V,, and let х=х, +х,. 
We shall determine the moments of the distribu- 
tion of x. 


A E k(k-1) xix 


k k- SD RIS 
X Xx MERE Kara 1:5 + 


2 
2 
In getting the k'th moment of x we observe that 
all summations involving X, or x, to an odd pow- 
er vanish because the variables are independent 
and the summation of odd powers of each sepa- 
rately is zero; and summations involving even 
powers, 6 and h, can be expressed in parts, thus 


4,1 = м 
Ухїх; = № Ну, itn, 2 


Finally, utilizing [8:12] in connection with the 
moment of x, and x,, we obtain after making the 
necessary summations and substitutions, in case 
К is even, the following k’th moment of х: 


p, = (k-1) (4-3)... (090: И) 
О{ соигзе 


My 27 а о = ео Ry pets 
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Since p, = V, + V, we obtain 


B. 7 (k-l1)no n... 

This is [8:12], the fundamental relationship be- 
tween the even moments in a normal distribution. 
We have thus proven that x, the sum of two nor- 
mally distributed independent variables, is nor- 
mally distributed. It is noteworthy that this 
maintains even though the variables have unequal 
variances. 

If x is the sum of three normally distributed 
independent variables, then x is normally dis- 
tributed. The proof is immediate, for we can 
first combine x, and x,, getting a normally dis- 
tributed variable, and this combined with х, 
yields а normally distributed variable. 

Finally, the sum of any number of independent 
normally distributed variables is normally dis- 
tributed. 

If we let x,, = -x, and substitute «Хү, for 
X; in the sum, the preceding statement obviously 
holds, The proof that adding a constant, or 
multiplying a variable by a constant, does not 
alter the relationship is equally simple, so we 
can assert: Any linear combination of normally 
distributed independent variables is normally 
distributed. 


SECTION 11. THE RELATIONSHIP OF CHI-SQUARE, 
T-SQUARE, AND F TO THE NORMAL DISTRIBUTION 


The purpose of this section is to state and 
illustrate, but not to prove these relationships. 
Let ^ be a magnitude of the small order of 
the differentials of calculus. Let 
it -х1 


= IA 
РА, oan 423 OVS OF 
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This equals the probability of a measure falling 
in the interval x, to x; + Лу. Similarly for a 
second variable 
1 -x3 
PRIS exp 
5 с У2т 2V, 


A, 


If x, and x, are independent, the probability of 
the joint occurrence, or of obtaining a case 
falling within the A, interval for the first уаг- 
iable and within the A, interval for the second 
variable is 


2 2 
гсэн а 
ЕО 
тулай лүү СТ о 


The distribution in two-dimensional space of the 
function within the brackets is related to a 
distribution with two degrees of freedom, and in 
general the i dimensional distribution of 


1 au X xi 
| exp —(— ++... t=) 
9,0, ... ey(2m)* Zu Viv Vi 


is related to a X? distribution with i degrees 
of freedom. The use of Vj instead of V, 18 to 
avoid ambiguity with the more usual meaning of 
V, as given in [8:38]. 

On the two-dimensional surface having dimen- 
sions x,/c; and X,/o,, the distance from the or- 


igin of any point is У(хү/сү) "+ (х,79,)?, а опе- 
dimensional variable, which we designate d chi. 


Of course it > 0. 
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2 2 
x x 

quee 
VIS 


Since x, and x, are uncorrelated the mean of 
this Д2, if many samples are taken is 


i х, 
Ми ОЙ ЕТС 
ої os 


= 1 + 1 = 9 =the number of d.o.f. 


For the general case we have 
"меап 2 = i Меап of y? with i а.о. Ё. КЕСУ 8:36] 


Since the mean 7872) = 1 (the median < because of 
skewness of the distribution), any set of Хү, %» 
+ + + X, values yielding аД1/121 (more precisely 
greater than the median) tends to indicate vari- 
ability factors in д2 in excess of those postu- 
lated. We have 


2 2 Chi-square with i 
2 


2 
x х 
1 | degrees of 
її =—+— +... Ж--- freedom. . . . [8:37] 
У. У, ү 


From the structure shown in [8:37], 11/1 is 
clearly a variance, as is also A itself. То 
simplify future notation we write 


у, = її О << [8:38] 


This И is the quantity entering into a vari- 
ance ratio 


n A variance ratio based upon 
i A . Р 
у 1 (numerator) апа j (4епот- 


See[9:21] 


inator) d.o.f. 


1j vi Sa Аны ЕЗ 


As a limiting case a Д?/і is a variance ratio in 
which the d.o.f. of the denominator variance is 
©, for Volo = 1. 
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Accordingly 
2 Chi-square expres- 
Vi 11 sed аз a variance 
Fio Rau САТО [8:39] 
1 1 


The variance ratio distribution is that of a 
quotient. The distribution of the numerator 
alone of F,, is that of V;/i, and of the denomi- 
nator is that of V,/j. Consider a scatter dia- 
gram Chart VIII V, with dimensions V,/i and Vi/j. 
A fixed value of F is represented by a straight 
line passing through the point (0,0). F * ^, is 
another straight line slightly to the right of 
the former and the frequency of cases between 
the two lines is Д. the probability of a vari- 
ance ratio between F and F * A,. Theentire fre- 
quency to the right of the line F is P, theprob- 
ability of a variance ratio greater thanF, Thus, 
though several steps removed, F is a derivative 
of the normal distribution. 


CHART VIII V 


171 


У/) 
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The t? (student's "f" squared) distribution is 
an f,, distribution, the distribution ofa vari- 
ance ratio having one degree of freedom in the 
numerator and j degrees in the denominator. 

The x^ (x of a unit normal distribution 
squared) distribution is the distribution of an 
F,,, a variance ratio having one degree of 
freedom in the numerator and х degrees of freedom 
in the denominator. 

The F distribution spans the field. However, 
since the determination of P from F would in- 
volve a three-way table, the dimensions being F, 
i, and j, its complete tabulation would be a 
collossal undertaking. This, in а modified form 
has been done in the 500-о44 pages of Pearson's 
Tables of the Incomplete Beta-Functions (1934). 
Snedecor (1938), Fisher and Yates (1938), and 
others have tabled F forselected values of P and 
extended values of i and /. 

In Chapter IX we give a transformation of F 
voiding the need of any of these tables, but the 
use instead of a normal probability table to 
obtain P for any F. 


СНАРТЕВ 1Х 
THE STATISTICS OF ATTRIBUTES 
SECTION 1. SITUATIONS WHEREIN QUALITATIVE SERIES ARISE 


Data are placed in categories when a quanti- 
tative relationship between the classes does not 
exist (more accurately "is not known to exist"), 
when the quantitative relationship is only vaguely 
surmisable, or when the known quantitative ге- 
lationship between classes is neglected because 
the more primitive and simple qualitative methods 
would seem to suffice. Customary types of quali- 
tative series would be such as a number of men 
classified by nationality, or by eye color, or 
by vocation, or by religion, or by marital status, 
or by language spoken, etc. Handling such situ- 
ations involves the statistics of attributes, or 
of qualitative series, or of categories. The 
fundamental item is the frequency in a class. 


SECTION 2. THE FREQUENCY IN A CLASS 


Let the classes into which the data can fall 
be a, b, c, . . . k and the observed frequencies 
Бесе f,, and let f, stand for any 
one of these frequencies as S takes values from 
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а to К inclusive. For many purposes it is desir- 
able to distinguish between those situations in 
which the probability ofa case falling ina class 
is known a priori and those in which it can only 
be surmised within limits from the observed pro- 
portions in the classes. If people are classi- 
fied into six vocational groups and the observed 
frequencies, when 100 are drawn at random, are 
18, 16, 22, 13, 14, 17, wecanestimate the prob- 
ability of a measure falling in a given class. 
We would estimate the chance of a person being 
in the first class as .18 and can use this esti- 
mate, with precautions to be noted, in judging 
whether a subsequent sample is one drawn from the 
same population. This is the most usual quali- 
tative series. 

Another, not differing in face appearance, but 
actually differing exists when one knows, because 
of antecedent knowledge, the probabilities of a 
measure falling in the different classes. Such 
а series isa stochastic series. If a homogeneous 
cube with red, yellow, blue, violet, orange, and 
green faces, is tossed 100 times the frequency 
of occurrence of the various faces up might be 


Color red yellow blue violet orange green 
f 8 l6 22 13 Iu 17 


We can make tests of this series which are not 
possible with the other for the a priori prob- 
abilities are 1/б for each class. 

Had the cube been a die with 1, 2, 3, 4, 5 
and 6 pips upon its faces, the frequencies of oc- 
currence would have the same fundamental proper- 
ties as before, though the X score is now a 
quantitative one. For some purposes the number 
of pips should be treated merely as categories, 
but for other purposes, as for example when paying 
gambling debts, they must be treated as quanti- 
tative measures. In the series involving the 


а но i ee 
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cube with colored faces some genius might be able 
to see a quantitative scale and use the results 
quantitatively, but the ordinary non-physicist 
could only treat it as a categorical series. 
This observation is made because it is true that 
series which seem to be merely qualitative may 
frequently be found, with further study, to be 
quantitative in a respect that has scientific 
and social consequences. А good example of this 
in psychology is in the quantitative measures 
that have been teased out of qualitative expres- 
sions of interests and attitudes. The enrich- 
ment of understanding is so great when qualita- 
tive data can be recast into a quantitative mold 
that the idea of doing so should be the initial 
thought of a student when first confronted with 
a qualitative series. 

We will first derive a few statistics per- 
taining to a stochastic variable. Let us know a 
priori that the probability óf a measure drawn 
at random in class А is p and that the probabil- 
ity of its not so falling is q, —(ptq=l). Let X 
= the number found in class A when a sample of № 
is drawn. If А = 1 then X = 0 or 1. The theo- 
retical, or true, distribution would be the ау- 
erage of many such samples, and would have mean 
and moments as in Table IX A herewith. 


TABLE IX A 


f = PROB. NO. 
WHEN T SAMPLES 

OF 1 EACH 
ARE DRAWN 


PROBA- 
BILITY 


MEANS] ». e ojo e o où o o e 
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M (Mean of X’s) =р 

У = р-р? = pq, by [6:05] 

H= p-3p^ + 2p? = ра (а-р), by [6:47] 

Е р-4р? + 6p? - Зр* = pq(1-3p+3p?), Бу [6:48] 


In a similar manner we can compute the moments 
of the X distribution if T (a large number) sam- 
ples of 2 each are drawn. 


The chance that the first case = 0 is q 
The chance that the second case = 0 is 23 
The chance that both cases Е 05 а" 


The chance that the first = 0 
and the second 
The chance that the second = 0 
and the first = l is qp 
The chance that the one = 0 


и 
[m 


is pq 


and the other = 1 is 2 Pq 
The chance that the first case = 1 is р 
The chance that the second case = 1 is р 
The chance that both cases = l is p? 


The sum of these chances (p^ +2ра +q?) = (р+а)?=1, 
as of course must be the case. When samples of N 
are drawn the successive probabilities are the 
successive terms of (ptq)". We could compute mo- 
ments, as in Table IX A, having X = 0, 1 and 2, 
and probabilities q?, 2pq and p^, butwill proceed 
immediately to the general case of the distribu- 
tion of the frequencies in a class, for which the 
probability of a measure drawn at random being in 
the class is p, when T (a large number) samples 
of N each are drawn. The (fX,) in column 3 is 
the entry under fX in column 2 for X - i, for ex- 
ample (fX,) = TNp [КЕИ 


ee oO EE OEE иншаны тэр урРээттттттит 


f = PROB. NO. 
WHEN T SAMPLES 
OF N EACH ARE DRAWN 


0 Tq" 


0 0 


1 Т№а*7?р TNp [a" -*] (£5) 


NONSE) ee, 

Tix2 
N(N-1)(N-2) ү» 

3 Ti1x2x3 i 


TNp((N-1)a" ^ p] (£X, ) *TNK(N-1.)p? [a ?] 


(10822) мэ 324 


ТИРЕ хо ( £X,) TK(N-1)p* 1(8-2)4"72р) 


. 


М 5 QU cag ос Шш 12208) КЕТ 


ТИРЕ —XU- D? 


Ни 


TNp(p*q)" ^? = TNp 
Np 


Ё (№2)...(1) i-a 
T ja 1)p* x ха zy ] 


TNp+TN(N -1)p" (ptq)" ^ = ТИр(Н\р - p) 
Np(1+Np-p) 


. TABLE IX B 
COMPUTATION OF THE MOMENTS OF THE FREQUENCY IN A CLASS 


-110:61 


$9710 V NI ХОХЯЛОЯЧА 


016 
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Meanofthe frequencies inthe class = Np [9:01] 
Variance = Np(1+Np-p)-(Np)? 

= Npll-p) = Npq (see [14:21]) [9:02] 


By similar calculation with fX? and fX', or more 
simply by utilizing [14:18], obtain 


из = Мра(4-р) . . . . (see [14:22]) [9:03] 


M4 = Мра[1+3(№-2)ра] (see [14:23]) [9:04] 


2 4 
H3 J (а-р)? Skewness (Pearson) of 


By 3 vo Npq frequency in a class [9:05] 
[s 1-6. Kurtosis (Pearson) of 
а -бра urtosis earson) o 
8, 2122-0588 frequency in a class [9:06] 
y? Npq 


The higher moments can be gotten by Romanovsky's 
reduction formula [14:18]. Thus the complete 
distribution of frequencies in a class, due to 
random sampling, is available. 

Let us examine this distribution for different 
values of М and p. 

If P is very small and N large then the mo- 
ments, V, and Из approach the value M, and the 
higher moments are given by [14:34], [14:35], 
[14:36], etc. This is a Poisson distribution. 
The particular fact that M -y underlies the com- 
putation of (2 based upon frequencies in classes. 

If М is large and p not very small then A79, 


which is normal skewness, and B,v3, which is 


mesokurtosis, and all higher moments approximate 
the normal distribution values. The approach to 
the normal distribution is most rapid if p 7 q. 
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For all values of М we note that B, = 3 when 
(1-6ра) 50, i.e., when p = .5 ‚> УЗ = .211325 


ог .788675. The distribution is platykurtic for 
values of p between these two values. Otherwise 
it is leptokurtic, but the deviation from meso- 
kurtosis is slight if М is evenofmoderate size. 
We can use the binomial distribution to get a 
precise test of the significance of a difference 
between an observed and a theoretical class fre- 
quency. With no other information to help in a 
solution let us be given the fact that Joe matches 
a coin with John and wins eight times in ten. 
What is the likelihood that, if the coins are 
unbiased and the throws fair, so one-sided an 
outcome would arise as a matter of chance? The 
assumption is that p = q = .5, so the distribu- 
tion of probabilities is that given by successive 
terms of (.5 * .5)!?, which are as follows: 


TABLE IX C 
CERTAIN BINOMIAL PROBABILITIES 


0 |112 |3 |4 |5 |6 T 


Joe wins - X 


Probability 
of occurrence 


00097 65625 
00976 56250 
04394 53125 
11718 75000 
20507 81250 
24609 37500 
20507 81250 
11718 75000 
04394 53125 
00976 56250 
. 00097 65625 
1.00000 00000 


Thus the chance probability that Joe would win 
eight times inten is .0439453125. The probabil- 
ity that he would have eight or more successes 
is .0546875 (- .0009765625 * .0097656250 * 
.0439453125). The probability that an outcome 
as extreme as this would arise as a matter of 
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chance is double this, ог .109375. This last is 
the figure that ordinarily concerns one. It is 
the P of goodness-of-fit tests and states the 
chance that a situation as extreme as the one 
observed would arise as a matter of chance. 

The usual way of obtaining an approximate P is 
by means of X°, the square (or squared) contin- 
gency. We will apply it to the present problem 
and compare the P therefrom with the correct an- 
swer, .109375. 


SECTION 3. CHI-SQUARE 


Let f, = the observed frequency in the cell, 
ОРАСАН Cn Т 2271, 751 з [9:07] 


Let f = the frequency in this cell according 
to the theory being tested. [9:08] 


A -1 = the cell divergence. . . . . [9:09] 


(Е-Ё)? = the cell square divergence. 


=X? =the cell Square contingency [9:10] 


^ne 


HE Bei 
ДООР 51155Г--- = the sum of such 
f 
for all cells = the square contingency[9:11] 


The number of degrees of freedom in J^, as fully 
explained inSection 4, is here designated d.o.f. 
Fi is a variance ratio with i degrees of freedom 
in the numerator and j in the denominator. 
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js a variance ratio with d.o.f. degrees 
= E f.m ^ ОЁ freedom in the numerator and in- 
d.o. f. state finity degrees of freedom in the de- 


nominator 


Р from F is the chance that a situation deviating 
from the theoretical one as much as does the ob- 
served one would arise as a matterof chance. For 
the coin matching problem, P from J^ will be only 
an approximate answer because the (2/4.0.Ё. and 
the F distributions derive from the assumption 
that the elementary variable is normally distri- 
buted and reference to [9:05] and [9:06]show this 
not to be strictly true for frequencies ina class. 
The cell square contingency is an approximation 
to a critical ratio squared for, if р is small, 
f = Np~N pq = V. Wilson, Hilferty, and Maher* 
(1931) observe for a more extreme example than 
ordinarily occurs that it makes no particular 
difference whether Np, or Npq, is employed in the 
denominator when calculating the cell square con- 
tingencies. We may therefore think of the cell 
square contingency as very similar to a squared 
critical ratio. If these separately are normally 
distributed the sum of a number of such is pre- 
cisely Y^, as tabled. For the coin matching 
problem we have; 


Number of Number of 
Successes Failures 


* Wilson, Hllferty, and Maher note that with К cells and (ka) 
~ 
2 
Ff) 
P, »Ч< 


] , are very 


4o. f. XK) and X'?/k, im whien ДЙ'?=$ 


similarly distributed. 
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The four entries in each cell are, in order, 


= the observed cell frequency 


ah 


f 7 the theoretical cell frequency 


(£-£ ) the cell divergence 


EEATT , 
= Ke = the cell square contingency 


The number of degrees of freedom is 1 for, with 
10 matchings and 8 successes, the number of fail- 
ures, 2, is dictated. J?/d.o.f. = 3.6/1 апі from 
tables (either a X^ with 1 degree of freedom; a 
Fi.» Withl degree of freedom in the numerator 
and o in the denominator; a t? with o degrees of 
freedom in the denominator; ог a х? of a normal 
distribution) we find P = .05775. This result 
15 not close to the correct answer, but Yates 
(1938) has pointed out the need for a correction 
in X^, in case of one degree of freedom. The 
correction is accomplished by "reducing by .5 the 
values which are greater than expectation and 
increasiné by .5 those which are less than ex- 
pectation." 

The intent of Yates' correction is to adjust 
for the fact that observed frequencies must be 
integers, while the theoretical distribution 
(the X^ distribution) is continuous. If, in a 
continuous distribution, a value such as 7.51 
must be assigned an integral value it is called 
8.0, which is to say that in fact the value 8.0 
covers all values from 7.5 to 8.5. 

If we therefore consider our observed number 


| 


4 
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of successes to be 7.5, and of failures 2.5 and 
recompute, we obtain X? = 2.5 and Р = .11385 
whichisin fair agreement with the precise value 


(. 109375). 


SECTION ғ. Д2 FROM THE GENERAL CONTINGENCY TABLE 


Let us now consider the problem wherein a pri- 
ori probabilities for the different cells are un- 
known. If 100 people, drawn at random in city a, 
200 drawn at random in city В, and 300 drawn 
at random in city y fall into vocational groups 
as given in Table IX D, we can ask and answer the 
question "Are the discrepancies in the relative 
frequencies for these three series such as might 
readily arise as a matter of sampling if all 
three drawings are in fact from the same parent 
population, or do they exceed this so that we 
must believe that they come from different parent 
populations, i.e., that communities a, В and y 
are of different sorts?" 

We must compare these observed cell frequen- 
cies with theoretical cell frequencies, but we 
have no a priori values. However, on the assump- 
tion that the three communities are the same, we 
can assign theoretical values which are in agree- 
ment with the marginal totals № and X. To il- 
lustrate for cell yA: Half the cases аге cases 
and 1/6 of them are А cases, so clearly 1/2 
times 1/6 of them, or 1/12 of them, [(1/2)(1/6) 
600 = 50], must, in the population, Бе yA cases. 

^o 
This, accordingly, is Рук» ris theoretical yA 
д 3 3 
cell frequency. In general, -y ху X N is the 


theoretical cell frequency for the cell at the 
intersection of row s and column f, which we 


designate cell st. ў : 
The 600 people whose status is recorded in 
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Table IX D constitute a sample. They are, with 
certain limitations, to be thought of as randomly 
chosen from an infinite population. If there 
were no limitations, a second sample might not 
have 600 cases, or 100 from City a, 100 in Voca- 
tion А, etc., but we are not interested in the 
facts that there were 600 cases, 100 from City a, 
100 in Vocation A, etc. That there were 100 from 
City а was probably dictated by the method of 
sampling and that there are 100 in Vocation А, 
though perhaps not dictated by the method of col- 
lecting the data is nevertheless not the issue 
that concerns us, which is the relative frequen- 
cies for the different cities of those in a cer- 
tain vocation. We shall therefore look upon this 
particular sample as one of an infinite number 
of possible samples for each of which the same 
marginal totals, 100, 200, 300, 100, 90, 140, 75, 
85, 110, maintain. The theory that we shall 
test,-are the relative frequencies in the voca- 
tions from city to city suchas might be expected 
due to sampling, is a posteriori to the knowl- 
edge of the marginal totals, which is to say that 
these marginal totals are an intrinsic part of 
the hypothesis and not issues being tested. Thus 
the hypothesis being tested is subject to the 
following restrictions upon the class frequen- 
cies: 
espe 


Q^ ав 


tan + fae B for 5 2) t Рв * 27 


+ foc t fan t foe t for + + fa +f i 


* fy + be ЖОР 


РО У 


f + fog + fos + foy + foe + fo S NS 19218] 
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QA ВА УА А 
Ра N 
fae t fpo Е aren Serie eos | 
foo t Ёш NS м, 
= N; 


There are eight restrictions upon the frequencies 
in the classes. These restrictions are of three 
sorts, one is connected with №, the size of the 
particular sample, the number of the second sort 
is one less than the number of rows, and the num- 
ber of the third sort 13 опе less than the number 
of columns. None of these restrictions can be 
derived from the others, so there are just eight 
independent linear restrictions. Such a restric- 
as 
fy, * ©» КЕШЛ DC № 

is not included for it is derivable from the 
first three given. 

Having thus fixed the marginal totals we know 
the exact theoretical cell frequencies, under the 
hypothesis. For any cell, such as that at the 
intersection of the s row and the Ё column, this 
theoretical cell frequency is 


Е Е Theoretical cell 


E VL E f der 4 
fig =WXWXN= PPM hypothesis of (9:15] 
Independence 


in which р, is the proportion in row $ and р, 
the proportion in column t and N the size of the 
sample. : 

Ап elmentary theorem of probability 1s that the 
probability of the joint occurrence of any num- 
ber of independent events is the product of the 
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probabilities of these events separately. The 
probability of being in row s is P,» in column t 
it isp, so the probability of a case falling in 
cell st is Р,Р,, and the "expected" number in 
this cell is P,P,N, and is frequently designated 
E... We shall employ f,, for, of course, we 
should not actually expect that for any cell a 
sample will have the exact relative frequency of 
the population. 

To reinterpret: The statement in the first 
sentence of this section, that a priori class 
frequencies are unknown is false when the hypoth- 
esis being tested is so cast as to be contingent 
upon the observed marginal totals. Under a hy- 
pothesis so conditioned we do exactly know the a 
priori, or theoretical cell frequencies. 

Clearly, by-and-large, theobserved class fre- 
quencies cannot vary as much from the theoretical 
class frequencies if there are linear restric- 
tions placed upon the class frequencies as if 
there are no such restrictions. It has been es- 


TABLE IX D 
VOCATIONAL FREQUENCIES FOR DIFFERENT COMMUNITIES 


COMMUNITIES 
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tablished by Fisher (1922) that a total Д? from 
a table haviné (i + 8) cells and в independent 
linear restrictions upon the cell frequencies 
has exactly the same form of distribution as a 
12 from a table having i cells and no restric- 
tions upon the cell frequencies. It is not the 
number of cells in the table, but i, the number 
of degrees of freedom, which is crucial in de- 
termining the J^ distribution. 


115512, = 10.134 


d. o. Ж И = 10 
2 
1 = 1.0134 
Ч: о. fi 


The four entries in a cell are, in order 
f,, = the observed cell frequency [9:16] 


Wu. = the theoretical cell frequency — [9:17] 


а = the cell divergence [9: 18] 
Qc the cell square [ 
2 Ч 9:19] 
Т 5 А © contingency 
st 


For the table entire we have, 
is th 
Х = влет [9:20] 
SECTION 5. TRANSFORMATIONS NORMALIZING THE 
VARIANCE RATIO DISTRIBUTION 


У, (512 as given in [8:38]) is a variance of 
the sum of i independent standard scores, and V 
another similar variance, independent of the 
former, having j degrees of freedom. 
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Vi/i A variance ratio having i 
= =  d.o.f. in the numerator andio, 21] 
1) 1272) J 9.0.1. in the denominator 


If the hypothesis is that V,/i = V;/j, the di- 
vergence of F,, from 1.00 (more precisely from 
its median value) is a measure of improbability 
of the hypothesis. We obtain a probability 
measure of any such divergence as follows: 


Let f; = [Р [9:22] 


13:54 138 в т. 
Let 9, = р-р, and 0) =~ F [9:23] 


Table IX E, whichis ап abridgement of the table 

of 0, values given in The Kelley Statistical 

on revision of 1947, facilitates the use of 
:24 


TABLE IX E 
TABLE OF @ VALUES 

i 9, i 9, 1 8, i 9, 

| 1.6499 17 2.0936 33 2.1070 49 2.1117 
2 1.8856 18 2.0951 34 2.1075 50 2.1119 
3 1.9642 19 2.0965 35 2.1079 60 2.1135 
4 2.0035 20 2.0978 36 2.1082 70 2.1146 
5 2.0270 21 2.0989 37 2.1086 80 2.1154 
6 2.0428 22 2.0999 38 2.1089 90 2.1161 
7 2.0540 23 2.1008 39 2.1092 100 2.1166 
8 2.0624 24 2.1017 40 2.1095 200 2.1190 
9 2.0689 25 2.1025 41 2.1098 300 2.1197 
10 2.0742 26 2.1032 42 2.1101 400 2.1201 
|| 2.0785 27 2.1039 43 2.1104 500 2.1204 
12 2.0820 28 2.1045 44 2.1106 600 2.1205 


13 2.0851 29 2.1051 45 2.1108 700 2.1206 
14 2.0876 30 2.1056 46 2.1111 800 2.1207 
15 2.0899 31 2.1061 47 2.1113 900 2.1208 
16 2.0919 32 2.1066 48 2.1115 1000 2.1208 


6!! < „0002 
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-0, +9, um 
а === the d normalizing [9:24] 
transformation 
1 1 
-3-08, 
1 таш 


Treating d as a deviate in a unit normal distri- 
bution will yield q, the proportion to the right 
of the point d. This is the desired P for Fy), 
the probability that a variance ratio as great 
as that observed would ariseasa matter of chance 


under the hypothesis that V,/i = V/s. This ap- 


proximation is close [see Kelley, Tables (1947)] 
if j > 3, and if j < 3 the xd transformation 
herewith is close. 


. 0809 i 
x = d(l- м аео 
1 when j < 3) 


АР from d [9:24], or from x [9:25], having а 
value in the neighborhood of .50 is most confir- 
matory of the hypothesis. А small value of P 
tends to disprove it by asserting that a situa- 
tion as extreme as the observed situation is 
very unlikely to arise as a matter of chance and 
a large value of P tends to disprove it by as- 
serting that as close a fit (agreement between 
Vi/i апа V,/j) as the observed fit is very un- 
likely to arise as a matter of chance. We elab- 
orate upon this point in the next paragraphs for 
it has frequently been misunderstood. 


We may, write Vi = Vi tnon-chancel + Vicenancel* 
that is, the observed variance differs from the 
underlying true variance by an amount, Viccnance) ' 
—commonly called the error variance, —whicn is 


due to sampling. Also Уу = Vjinon-chance! 


У, tenance» though in most cases the hypothesis 


эл US 
Е: 
Е 
4 
E 
14 
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is such that Мо = 0. Then VS 


V. and the Fi; test is to see if 


1 (chance) 


1 (non-chance) exists, Expressed in terms of 


the city-vocation distribution problem the test 
is to see if the relative differences in frequen- 
cies of vocations from city to city are such 
that chance is insufficient to account for them. 
Thus 


IV; (non-chance) * Й Ч 
Fi becomes 


[V; (tance! 2 


Since V, (chance) / 1 Only differs from У, tenance) ИЛ 


as a matter of chance the variance ratio hovers 


around the value 1.0 when инь = 0. 


If Vitnon-chance) 13 Substantial and 


V 


{chanced _ V еналсо! 
ЖАКЕ ea a: M 
then Fi is much greater than 1.0 and P is small. 
This situation is easy to interpret. We simply 
believe the hypothesis disproved because of the 
presence in V, of non-chance factors. 

If F,; is much less than 1.0 we not only must 
believe that SV ritaonsehahce] is nonexistent or 


negligible, but also that Mate is unrea- 


sonably small, again disproving the hypothesis. 
This situation is the more difficult to account 
for. It can be brought about (as certain poorly 
devised studies bear Witness to) if an experi- 
mental procedure has employed such controls as 
to eliminate to a degree the operation of the 
same chance factors in V; as operate in V.. 

In short, if F,. is large and P small consider 
the hypothesis untenable because of variability 
factors in V, in excess of the hypothesis, and 
if Fi is small and P large consider it untenable 


 ———— ЧЫР 
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because of faulty controls in the experimental 
procedure. This statement of the case holds 
when the denominator variance, V,, is the chance 
variance. The writer recommends that F,, be so 
written that this is always the case no matter 
if this result in an F;, which is less than 1.0. 
In order to use certain tables it has been cus- 
tomary so to write F that itis greater than 1.0. 
It is only necessary to note, if P,; isP from 
Fij and Р is P from Fi 


that 
Ру ЕЕ estre id ese PPS 


to see that no matter how it is written to con- 
form to tables it can always be written with the 
error variance in the denominator for purposes 
of interpretation. 


Since X? / d.o.f. from Table IX D is equal to 
V,/i in which i = 10, we have 


10.134 


и = 1.0134 
Since Fio œ so nearly equals 1.0 we can, without 


computing P, be convinced that cell-divergences 
from theoretical values as great as the observed 
divergences could readily arise as a matter of 
chance. The determination of P merely confirms 
this. We have 


Fio = 1.0134 0 = 2.121320 
fig в = 1.00445 [1 = 1316228 

а = .17885 
io = 2.0742 БУ 7 49005 


e “Р computed by linear interpolation in Pearson's 
TABLES OF Д2 (1914) is „4294. 
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Thus there are 43 chances in 100 that if (Vi/i) 
= 1.0 a sample value as large as 1.0134 would 
occur. 

We find the data highly consistent with the 
hypothesis tested, but we should not say that 
the hypothesis has been "proven", for the data 
may be consistent with any number of other hy- 
potheses. Though a high or a low P can be taken 
as "disproof" of the hypothesis, one in the 
neighborhood of .5 is only indicative of consis- 
tency with the hypothesis, not of its proof. We 
never prove a hypothesis in the sense that it 
and none other is tenable, but only at times 
have evidence which does not disprove it. 

The accuracy ofP from the d, or the xd-trans- 
formation extends to the third or fourth decimal 
places and is sufficient for all experimental 
purposes. А more serious hazard is consequent 


tothe assumption that Cf, .7f,,) is normally dis- 
tributed. The actual divergence from normality 
is due to the fact that f,, takes integral values 
only, and further, if p, and N are both small 


the distribution of (122 es approaches a Poisson 
and not a normal distribution. To lessen this 
hazard Kelley (1923) recommended that (P.N) be 
> 1; Fisher and Yates (1938) recommend such a 
grouping that (p.M) be not < 5.0; and Cochran 
(1942) provides data upon the requisite minimal 
size of (p,,N) at the one and the five per cent 
levels of significance for different degrees of 
freedom. Не writes, "With only a single small 
expectation [i.e., only one cell with small 
(p,,N)] the conventional limit of five for the 
smallest expectation appears unduly high. At the 
5 per cent level, the tabular Д? distribution may 
be used without undue error with an expectation 
as low аз 0.5, and at the one per cent level with 
an expectation as low as 2." With one degree of 
freedom Yates' correction isto beapplied, though 


[9:27] ^ NORMALIZING VARIANCE RATIO 331 


Cochran points our certain infrequent exceptions 
to this. А correction of the Yates type is in 
order when the number of degrees of freedom ex- 
ceeds one, but Cochran has shown that its value 
is small and its precise computation rather com- 
plicated. 

A fundamental property of X*’s is given by the 
following equation, in which the subscripts in- 
dicate the respective degrees of freedom: 


Additive 
property 


Xi + XG + + ec. tHE Арье ot АО 


р from 2, ,. 4,18 the probability that chance 
would yield divergences-squared in all the sit- 
uations whose sum total is as great as that ob- 
served. This frequently enables the combination 
of a number of experiments so as to obtain a 
single measure of confidence, However, if the 
sign of a divergence is material, the method 
does not answer the question of importance. What 
is then required is a method based upon devia- 
tions with their sundry plus and minus signs and 
not one based upon deviations squared. Е 

X? has been extensively used in such contin- 
gency situations as the City-Vocation problem 
here given as an illustration, but theoretically 
it is more exactly applicable to quantitative 
problems wherein the variable is both continuous 
and more nearly normal than is the frequency in 
a class variable. Generally, where quantitative 
data are involved, more informative regression, 
analysis of variance, and variance ratio (y 
not merely Е; which = 12/4.0.4.) methods are a- 
vailable, as illustrated in Chapter X. The stu- 
dent should not believe that the Д2 method is in 
any way limited to contingency data. 


СНАРТЕВ Х 
ESTIMATION, REGRESSION, AND CORRELATION 
SECTION 1, A BRIEF PERSPECTIVE OF THE FIELD 


It has been claimed that every properly con- 
ducted experiment should hinge upon some null hy- 
pothesis. Most correlation problems are not so 
stated. If a new psychological test is being de- 
veloped for use in connection with college admis- 
Sion, the initial null hypothesis that the test 
correlates with college success to the extent zero 
isnotmade. If the correlation found is .70 (st. 
er. -.03) something very useful is now known that 
was not known before, but there has been no null 
hypothesis. We shall here be concerned with cor- 
relation and regression methods without regard to 
null hypotheses, but will later note how nicely 
they fit in with certain null hypotheses and how 
gracefully they contribute to the analysis of 
variance, 

The correlation coefficient, го], 13 a measure 
of mutual or reciprocal relationship and is fre- 
quently informative of itself, though its primary 
utility is simple аз ап essential measure entering 
into an estimation equation. . 
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Linear regression equa- 


№ E] * b iX tlon of upon X. [10:01] 
01 0 1 1l 
8017 My м г. [10021 
„ЕЧ С з... 

оү 

Thus 
2 с, ©, 
Х, = M, - CA n: гуу — 31852217 [10:01a] 

от gr 


When deviation scores are employed [10:01] be- 
comes 


х, = Ь х = ту Tx ess [10:04] 
31 


and by parity 


X, = Бох = а, oss. [10:042] 
бо 
When standard scores (see [8:23]) are employed 


[10:01] becomes 
Zo. = Ly Za ЗНО vee o o [10:05] 


T E l 


1 


zy7r уе: 110:05а) 


revealing the reciprocal nature of the relationship. 

There are many corrı lation formulas because 
of the variety of types of data and because of 
the need for various corrections in the raw re- 
sults, We will designate the four most important 
corrections by preceding subscripts: а for at- 
tenuation, c for coarseness of grouping, f for 
fineness of grouping, and s for shrinkage. Which 
of these corrections applies varies from situa- 
tion to situation because of the differences in 
types of data and of uses to which results may 
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be put. The corrections listed in Table X А have 
been developed in connection with quantitative 
data, and the fineness of grouping correction is 
of prime importance in connection with qualita- 
tive data. 


TABLE X A 
NOTATION OF SUNDRY CORRECTIONS TO CORRELATION COEFFICIENTS 


Foi» rawr between quantitative variables X, 
and X, 


a а Tox also designated r,,, is Гу, corrected for 
attenuating effect of errors of measure- 
ment of X, and X, 


a Tox also designated r,,, is Го, corrected for 
0 2 
attenuating effect of errors of measure- 
ment of X, 


1701, also designated Tot is Гу: corrected for 
attenuating effect of errors of measure- 
ment of X, 


соз е con To,» Го: corrected for equally spaced coarse 
grouping of X, and X, 


Гу, Го: corrected for equally spaced coarse 
grouping of X, 


Гу, Го} corrected for equally spaced coarse 


сү z 
grouping of X, 
oTorede Гол гуу corrected for unequally spaced coarse 


grouping of Х, and Х, 


2 T 
stoi’ Тор corrected for shrinkage 


г, ОГ T4. with multiple preceding subscripts 
are combinations of the preceding 
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In case regression is nonlinear one of the 
variables, usually X,, may be transformed into a 
new variable, Xi, producing substantial linearity 
and the correlation гоу, computed. This trans- 
formation may frequently be accomplished by a 
higher order parabolic regression. If of the 
second degree, the situation is equivalent to a 
multiple variable problem, involving three vari- 
ables, X,, Хү, X2, the multiple correlation no- 
tation for which is Года, 12 

Y = a+ bX, за LL UU 

Quem Aa 
The double bar is used to distinguish this esti- 
mated X, from that given by [10:01]. The corre- 
lation between X, and the right-hand member of 
[10:06] is Год1,1? and it may call for a, c, and 
S corrections. 

This variety of corrections would be very 
troublesome were it not ordinarily possible to 
work with such original data as not to demand 
them. If N, the number of cases in the sample, 
is large the shrinkage correction is negligible. 
The grouping of measures into classes is usually 
upon the basis of equally spaced intervals so the 
c' correction is not involved. Usually 12 or 
more equally spaced classes can be employed, thus 
making the c correction unimportant. The "cor- 
rection for attenuation" adjustment yields an 
estimate of what the correlation would be were 
errors of measurement lacking. It thus answers 
a theoretical question and is not appropriate 
when the problem is toestimate an actual X, (not 
a hypothetical one without error) from an actual 
X, (not from a hypothetical one without error). 

Though, for a particular situation, these cor- 
rections may not be demanded, it is only after 
the student has thought of them in turn and de- 
cided that they are inappropriate (as the a cor- 
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rection), or small (as the с, s, or nonlinear 
adjustment) forhis problem that he should refrain 
from making them, ifit is possible to make them. 
Not infrequently the logic of the problem calls 
for a correction for attenuation but the requisite 
data for making the correction are lacking. In 
such instance a definitely expressed reservation 
in interpretation is called for. 

An important classification of correlation 
procedures depends upon the type of variable in- 
volved, as listed in Table X B. 


TABLE X B 
TYPES OF AVAILABLE VARIABLES 


(a) Two-category variable 
(4-1) Values important as given, e.g., male and 
female. 


(a-2) Values are crude measures of an underlying 
continuous trait, e.g., above-average and 
below-average intelligence. 


(b) Categorical variable, consisting of several values 
which are not known to be quantitatively related, 
e.g., nationalities. 


(c) Ordered variable, consisting of several values 
which may be placed in an order of magnitude, but 
not quantitatively expressed, e.g., a number of 
people ranked in order for sociability. 


(d) Quantitative variable, e.g., height. 


The field is fairly adequately covered by 
Tables X A and X B, but neither is inclusive of 
all the possible situations that should be dis- 
tinguished because of the statistical consequences 
that follow. Even so, when the various cor- 
rections are combined with the various types of 
variables there result overa hundred easily dis- 
tinguishable and definable correlation situations. 
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Of all these situations the most fundamental 
one is that which requires no a correction be- 
cause the variables are fully meaningful as they 
stand, no c correction because each variable is 
recorded in 12 or more equally spaced intervals, 
no S correction because the sizeofthe sample is 
250rgreater, and по adjustment for nonlinearity. 

The data of Francis Galton upon the relation- 
shipofheight of offspring and of parents, which 
meet all these conditions (though Galton used 
fewer classes than 12), will be used in the next 
two Sections for illustrative purposes. At the 
Same time, a noting of Galton's study provides a 
striking illustration of scientific induction. 


SECTION 2. THE PROBLEM OF CONCOMITANT VARIATION 
IN THE SCIENCES 


A problem common to all sciences is that of 
defining and determining the cause of concomitant 
variation. In this the physical sciences have a 
great advantage over the biological and social 
Sciences in that (1) errors of observation and 
measurement are usually very small in comparison 
with the measurements involved and (2) fewer 
factors are ordinarily present. Їп measuring 
Some intellectual capacity ofa group of children, 
it usually happens that the standard errors of 
the test scores obtained are greater than half 
the standard deviation of the scores of the group. 
Obviously any relationship between two capacities, 
each measured with no greater reliability than 
this, will be clouded by the errors of measure- 
ment. This is serious, but it is not the only 
difficulty. In measuring the effect of кта, 
physicists can ordinarily assume that 10 pomoca 
of lead and 10 pounds of iron will act 1n a PUT 
lar manner, but in measuring intellect, ida 
prices, etc., to say that one reagent, Rogue 
modity, etc., is equivalent to another wit 
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spect to the function being examined, is usually 
questionable. Accordingly, whereas the investi- 
gations of physics lead to the establishment of 
"laws", those of the social sciences ordinarily 
lead to the discovery of "tendencies". Relation- 
ships between two psychological, biological, or 
social factors frequently depend upon а number of 
causes, each more orless independent, and no one 
of such importance as to dominate the situation. 
Under these conditions, the relationship tends 
to be linear. Inothercases, where the true rela- 
tionship is nonlinear, large errors of measure- 
ment, will lessen the strength of the measurable 
relationship, thereby making it more difficult 
to determine the exact nature of whatever curvi- 
linear relationship may exist. It is also true 
that relationships which are intrinsically curvi- 
linear when determined over a range of the two 
variables from very low to very high, may show 
practically linear relationship throughout a 
short stretch of the range. For all the reasons 
stated, a measure of relationship based upon the 
assumption of linearity is of great importance. 
Even in the case of known nonlinear relationship 
itisof much value as a point of departure. The 
equation for estimating one variable from a know- 
ledge of another linearly related toit is called 
а linear regression equation [10:01], and in 
general the best measure of mutual linear rela- 
tionship is the Pearson product-moment correla- 
tion coefficient. 


Fundamental properties of this measure of re- 
lationship were discovered and presented graphi- 
cally byFrancis Galton (1877 to 1888). Galton’s 
investigations had to do with the inheritance of 
traits, and certain of the terms which he used 
would hardly have arisen if the development had 
involved other data. For example, the symbol "r" 
was a measure of "reversion", such, for example, 
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as offspring upon mid-parent (a mid-parent meas- 
ure is an average of the measures of father and 
mother). Later, Galton used the terms "re- 
gression" and "co-relation" andcalled the measure 
the "index of co-relation." Weldon very properly 
calls this measure "Galton’s function" and Edg- 
worth, in 1892, gave it the name which has sur- 
vived, "coefficient of correlation." Pearson 
(1920) has pointed out that the product-moment 
function of Bravais bears but a resemblance in 
form to the product-moment coefficient of согге- 
lation. Whereas Bravais started with observations 
which were assumed to be independent and obtained 
derived measures whose product moments did not 
equal zero, Galton started with the epic-making 
concept that the original measures were dependent. 
Partial correlation analysis leads to independent 
measures, having given related original measures, 
which is exactly the reverse of the Bravais or 
Gaussian developments. Galton seems deserving of 
being calledthe father of correlation, though so 
ubiquitous a phenomenon as concomitant variation 
in the natural and social sciences did not com- 
pletely escape his predecessors and it has been 
rediscovered by many modern students working in 
fields remote from those in which Galton labored. 


SECTION 3. FINDINGS RESULTING 
FROM GALTON'S GRAPHIC TREATMENT 


Galton's procedure, based upon medians and 
quartile deviations, has given way to the more 
accurate one involving the product-moment formula 
based upon deviations from means and standard 
deviations, as developed by Karl Pearson. 

Pax, Dy ХХХ, МИ, 


2. 22 
Mao, Nox, EEH SKS, 


Equivalent formulas for Pearson product-moment correlation 
coefficient (See also (10:27) and (10:29) 


[10:07] 
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We shall use Galton's datain deriving a meas- 
ure of correlation. Galton obtained the heights 
of parents and the heights of children, and drew 
up a "correlation" table or "scatter diagram" 
showing the relationship between the two, All 
female heights were multiplied by 1.08 to make 
them comparable with male heights. This is not 
the soundest procedure, butin this problem leads 
to no material error. Letting X,, X, represent 
male and female heights, c}, c, their standard 
deviations, and M,, M, their means, it would have 
been better to have reduced each female height 
to a comparable male height by the equation 


Comparable male height 7M HK Woe (10:08) 
2 


The discussion which follows will assume that 
CHART X I 


CORRELATION BETWEEN HEIGHTS 
OF MIDPARENTS AND OFFSPRING 


HEIGHTS OF ADULT CHILDREN EXPRESSED AS DEVIATIONS 
FROM THE i d HEIGHT 684 INCHES 


HEIGHTS ОЕ MIDPARENTS 


0 
4 7 
(| 99 44460 34 feda 9 |0 мо эби}. , 
90 729 }63г0-) (* 
Ste] 47-43 -40 -4 4 19 40 52 да 23 


ЯН! i55 27 200 5 и 19 O 250 308 207 Јев. 
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the more reliable method of transforming ("5 гапз- 
muting" according to Galton) female into male 
heights was followed and also that means were 
used throughout. Presumably Galton used medians, 
but no fundamental difference in treatment fol- 
lowed from such use, it simply being a slightly 
less reliable procedure.  Galton's diagram con- 
tained the data given in the accompanying corre- 
lation table or scatter diagram, Chart X I. 

Deviations being measured from 68.25 inches, 
which is a small fraction of an inch away from 
the actual means, are labeled £ and ё instead of 
X and y, and account of this slight difference 
is taken in Section 5. From just such data as 
given,—in fact, it is likely that these identi- 
cal data were involved,—Galton inferred certain 
relationships which we now know hold with every 
normal correlation surface [formula 10:10]. 

(а) А plot of the means of the horizontal ar- 
rays (rows) as shown by the x's shows the "re- 
version" of offspring upon height of mid-parent. 
Thus if the mid-parent height is 2.50 above the 
mean, the average or most probable height of 
offspring is 1.25 inches above the mean. 

(b) The line connecting these means may be 
closely represented by a straight line through 
the origin, or intersection, of the means of the 
two distributions. This is the line showing the 
regression (or "reversion") of offspring upon 
mid-parent. 

(c) There isa reversion or regression of mid- 
parent upon offspring. This would be represented 
by a straight line passing approximately through 
the o's. Thus for every correlation table there 
are two regression lines. 

(d) The slopes of these two lines are equal, 
provided the standard deviations of the two dis- 
tributions are equal. i 

(e) If the standard deviations are equal this 
slope varies between zero and one (Galton did 
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not suggest the existence of negative correla- 
tions), and may be represented by the symbol "r." 

(f) The standard deviations of the measures 
found in successive arrays (successive rows, or 
successive columns) are approximately equal and 
are smaller than the standard deviations of the 
total distribution, so that, if c, equals the 
standard deviation of the heights of offspring 
апі с. ; equals the standard deviation of off- 
Spring corresponding to given heights of mid- 
parent, then 


@ 202 0-7 


where Л is a positive quantity less than 1.0. 
Also, dealing with columns instead of rows, 


91.2 = ої 1-^) 


in which А isthe same as before, c, the standard 
deviation of heights of mid-parents, and 01,2 
the standard deviation of heights of mid-parents 
corresponding to given heights of offspring. 
These relationships, which Galton discovered in 
his data, will require re-examination and re- 
Statement (see Chapter XI, Section 4) for less 
linear and less homoscedastic (less equal-varia- 
bility of arrays) data. 

(8) There is a simple relationship between № 
and г. Jt is 


X = r^, so that 
variance of devia- 
tions from points on 
От = оё (1-г?) regression line [10:09] 
(ѕее [10:24] апа 
2 E 2 10:49) 
о е! 


(h) Each array is approximately normal if the 
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total distributions are normal. 

(i) 18 contour lines for different frequency 
densities are drawn in the diagram, they consti- 
tute a system of similar and similarly placed 
ellipses, the conjugate diameters of which are 
the two regression lines. 

Galton made no claim to mathematical ability, 
but through sheer insight into the phenomena of 
mutual implication made these penetrating obser- 
vations. Не carried his conclusions, stated in 
probability terms, as to the nature of the cor- 
relation surface, to J. D. Hamilton Dickson (1886), 
a mathematician, who readily wrote down the nor- 
mal correlation equation involving two variables. 
In the notation here used this is: 


2) [10:10] 


Normal correlation surface, two variables 


Galton's humility, after years of collection 
of data and subtle analysis of the same, in the 
face of the neat but not involved mathematical 
derivation, is worthy of note by the social sci- 
entists of this day who scoff at mathematical 
analysis. Upon receiving the solution of his 
problem from Mr. Dickson, he wrote:* "I may be 
permitted to say that I never felt such a glow 
of loyalty and respect towards the sovereignty and 
magnificent swayofmathematical analysis as when 
this answer reached me, confirming, by purely 
mathematical reasoning, my various and laborious 
statistical conclusions with far more minuteness 
than I had dared to hope, for the original data 
ran somewhat roughly, and I had to smooth them 
with tender caution." 


* see Karl Pearson, NOTES, 1920. 
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SECTION 4. ALGEBRAIC STATEMENT OF GALTON' S GRAPHIC 
FINDINGS AND DERIVATION OF CORRELATION FORMULAS 


Let us consider these discoveries in detail. 
Let хү stand for the height of mid-parent, X, for 
that of offspring, each as a deviation from its 
mean. Let the variances be У, and V, and the 
Standard deviations o, and o,, while г is the 
slope of the regression line in the "reduced" 
Scatter diagram, i.e., in a correlation table in 
which the measures entered are x,/o, and x,/o,. 
Galton reduced by dividing Бу the quartile devia- 
tions, leading to essentially the same result as 
here. The slopes of the two regression lines are 
equal and equal to г. We will shortly obtain а 
value of r by means other than the graphic method 
of Galton. Finally, let x, stand for an estimated 
height of offspring, knowing the mid-parent height 
and let х, be an estimated height of mid-parent, 
knowing the offspring height. An alternative and 
more precise notation for X, is хәлі, Which may 
be read "x, dependent upon ху." Similarly, ап 
alternative notation for x, is хул. With this 
notation, discoveries (a) and (b) together are 
equivalent to 


ЫН X» тун ie 

— = — undamenta orm of : 

а p ©, regression equation [see 10:05] 
1 2 


Propositions (c) and (d) are equivalent to the 
addition of the second fundamental regression 
equation, as follows: 


Proposition (е) is liable of misinterpretation. 
If r=0, it asserts that there is no relationship 
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between the variables,—no regression of one 
variable upon the other,—while г=1 means com- 
plete mutual implication. More loosely stated, 
this latter situation will be described as one of 
complete dependence. So far as the data are con- 
cerned there is no evidence that the heights of 
parents have any more to do in causing the heights 
of offspring than do the heights of offspring in 
causing the heights of the parents. This is 
still true should г=1. А situation exists and a 
correlation coefficient measures the tendency of 
the paired observations to be related, but it 
gives no evidence whether x, is the cause of x,, 
X, the cause of x,, or whether the cause is un- 
known and lies back of both. We think of parents 
being causal agents in determining the heights of 
offspring, but we do this for reasons outside of 
the scatter diagram, namely, the parents have ex- 
isted earlier than the offspring ina time series. 

Propositions (f) and (6) are the result of 
careful study of data, but Galton gave a simple 
proof of (6). The variability of the offspring 
generation is consequent to the variability of 
the arrays (rows) and the variability of the 
means of these arrays. If Xj, is the distance 


of the mean of the i-array of offspring heights 
from the mean of all offspring heights and, as 
before, V,., is the variance of this as well as 
of each of the other arrays, and if п, is the 
number of measures in this array, then (пу, 1 + 
пухлі) is the contribution of the i-array in 
the calculation of the variance of the distri- 
bution, thus: 


k > k 2 
y зїйг Цан Analysis of У, [10:11] 
2 N N variance 
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in which k over the summation sign indicates the 
number of terms in the summation, i.e., k is the 
number of classes into which the x, variable is 
grouped, The first term of the right-hand member 
yields 


EE [10:12] 


Since, for any array, 


“2 
rest: ee PME [10:13] 
О 
апі ѕіпсе 
Sn хн Ух? =Й У. [10:14] 
ra I о es. s^. 5 


уе obtain for the second right-hand term of 
[10:11] V,a, the variance of the points on the 
regression line 


= r?y. Varlance of 


Уд: = N 2 estimated x,'s [10:15] 


It is important to distinguish between this, the 
variance of the points on the regression line, 
and [10:50], the variance error of a point on 
the regression line. 

У»: 13 read as "the variance of that part of 
x, that is dependent upon x,." 
Accordingly, we have 


сы 


Total variance In terms of the variance of arrays and 
of the variance of points on linear rearession line 
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V, = Ура ОКЕЙ; ЯМЧГ х 201050) 


Transposing terms 


VEM E V,(1=r7) Variance of an array [10:18] 


Also 
V. = У,(1-є°). \. MESI 


This derivation proves Galton's proposition (6), 
that the reducing factor is equal to г. 

(h) is an experimental finding which, coupled 
with (6) and (a), (b), (c), and (d), quickly 
gives the equation of the normal bivariate cor- 
relation surface. 

A normal equation, with area 1.0, is given 
herewith [10:19] 

2 


1 =x 
mie e LO A 
У e vim E 


Let any integral value of x include all cases 
with values between (x-.5) and (x 4.5). Further, 
y is the ordinate at the point x, so that y X l 
is the area of a rectangle with ordinate y and 
base from (x-.5) to (х *.5). Thus, if the (in- 
terval/c) is small,—and let us here employ such 
units of measurement that this is so,— then this 
rectangle closely approximates the curve for the 
region from (Х-.5) to (x +.5); and y is the proba- 
bility that a case drawn at random will have the 
value x. 3 

Referring to the two-dimensional scatter dia- 
gram, a similar statement for the probability 
that a measure in the x, array shall have the 
value x, is y,,,,—the ordinate for variable 
values of x, for a fixed value of х, 
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Ty 
та 
1 "(xy x.) 


M exp ————__ [10:20] 
eV 1-r? Ут 2, (1-7?) 


Уі. 2 


The probability that a measure will 1іеіп the x, 
array is 
2 
1 9 
Y Кы “эт ёхр 
оу 2m 2V. 


[10:19] 


The probability of the joint occurrence, ог that 
a case chosen at random will have the value x, 
and the value x, is the product of these two proba- 
bilities, or y = Y2¥i, 2, which reduces to 


1 і 2 
ars exp ——— г | 
2mo,v1-r* 2(1-г2) Ио V, 


Normal bivariate distribution of volume 1,0 [10:21] 


Multiplying the right-hand member by N yields 
the normal bivariate distribution with number of 
cases equal to Ү, which Dickson wrote down to 
satisfy the experimental findings that Galton re- 
ported to him. 


SECTION 5. DERIVATION OF COMPUTATIONAL FORMULAS FOR b AND г 


The magnitude r has, to this point, been defined 
as the slope of the regression line in case stand- 
ard scores (see [10:05] and [10:05a]) are the 
measures entered into the correlation table. We 
will now prove that in any scatter diagram, the 
two "best fit" linear regression lines are, as 
recorded, [10:04] and [10:04a], in which r is 
readily computable by formula [10:29] and the b's 
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by formulas [10:30] and [10:31] 

The term "best fit" is used as in the method 
of least squares. A "best fit" determination is 
one in which the variance error of estimate is 
minimal. Determinations can be made resulting 
in the sumof the deviations, of the cubes of the 
deviations, of their fourth powers, etc., being 
a minimum, but since the days of Gauss, it has 
been known that in the case of a normal distri- 
bution none of these determinations will result 
in as small a median error as one in which the 
sum of the squares of the errors of estimate is 
made a minimum. The constants of distributions 
which are quite divergent from the normal, so 
determined that the variance error is minimal, 
are undoubtedly excellent determinations, but it 
is no longer possible to say that constants so 
calculated have smaller median errors than would 
others derived upon a different principle. In 
derivations herewith the Gaussian principle of 
least squares is adhered to. 

In Chart X І the regression line drawn is that 
of x, upon хү. Its equation is 


С, 
а [10:22] 
9 


х, 


The "slope" of the regression line is b,, and 
equals tan ф. Having given a value of x,, the 
best estimate of the corresponding x, is X,,— 
[10:22]. In general x, will not be identical 
with the actual or experimentally obtained value 
of x,, so that 


х = хал Ж; Error of estimate [10:23] 


is an error of estimate. The variance error of 
estimate is given by 
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$e: Variance er- 
Eu M. . 
N [0] ana 10: 24] 

10:49 |) 
The value of b,, which makes this minimal is the 
value sought. 


ПЕРО) 2. br x, )? 


2 = = 
92.1 7 Мал 


= Ух; -25,, Zxx, + bj 2х1 
т: | n2 
=N V,=2 N b, c, +N b2, V, 


in which c,,, called a covariance, or product 
moment, equals (2x,x,)/N. Dividing by W and 
completing the square on the last two terms, we 
have 


€12 c? 
14 = V,+ № 2 ет 
241 2 Yi 2452: Cy, + 5217; V, 


Vi oy 
As the () term is squared, it can never be nega- 
tive, so obviously У, т takes its proper, or 
smallest, value when the () term, which is the 
only term containing b,,, equals zero. Solving 


c 2xix DISC L 
by 212 = 12 207172 flet or [10:25] 
(27 N Vi Sx 2 
Х| upon X, 
By parity 
Суз  2X,X,  2x,x, 
lom - = [19:26] 


v, М TESS 


Regression coefficient of Хү upon X, 
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The correlation coefficient is either regres- 
Sion coefficient, when standard scores are in- 
volved, since they are then equal. То show that 
this common value is r as given by [10:07] is 
left as an exercise. 

When b,, f b,, we can note that the sign of 
both b's is that of =x,x,, so that 


PELLE tac ТОЯ А 


with the sign attached to r that is the sign of 
Бу, or of Буу, 

The formulas for r and b based upon deviation 
Scores, X, and x,, are unsatisfactory for compu- 
tational purposes because x, and x, would ordi- 
narily be continuing decimals. We require a 
method utilizing integral values corresponding 
tothemidpoints of the classes into which the X, 
and X, scores have been grouped. То avoid a no- 
tation calling for subscripts of subscripts, we 
let X, x, and & have meanings forthe X, variable 
as already defined [Ch. VI, Sec. 6]. The grouping 
interval inthis first variable is ij. Ме let Y, 
y, Ç, and i, have corresponding meanings for the 
X, variable. Formulas for the evaluation of M;, 
V,, M,, and V, in terms of ё and ( scores and of 
i, and i, have already been given, so we only 
need to express Сү, in these terms to have a- 
vailable computational procedures based upon the 
convenient & and ё scores. 


N су, = Xxy = Х[(2=И,)1;] L7 ig] 
= dd (SECM EL - MSE +N И.М.) 


111, 


(SEC-N M) [10:28] 


Cio = 
Covariance from & and С scores 


Utilizing this with earlier formulas we have 
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Cio ОИ ИИ, Work formula 


r = = —— for r [10:29] 
ks 9105 N Te о, 
с Е М, М, NN 
bi, = 22 = 3 р a ко, b. [10:30] 
12 ТЕЛА 
Я Сіз 1, (30 - М ИН.) work Pina] 
tor 3 
D. i, N V, TuS 


Computing the correlation and regression con- 
Stants for the Galton data by means of these 
formulas we have 


№ = 324 
Arbitrary origin, = 68.25, and T5 
Arbitrary origin, = 68.25, and AUSSER] 


M, 7— 7.1173, so that И. = 68.25 
о шеи 


И, т 06]. 30:5Hat Я 68.125 
+ .5(-.0617) = 68.92 
_ 2140 


————— и 
2 394 (.1173) 430, ѕо 


И = 8.4430 (.5)? = 2.1108 


{ =£ | Cog? = 19.5024, so V,=4.8765 
324 and c, = 2.2081 


б 65) 5) 


Сү, =~ [1618 - 324(.1173) (-.0617)] 
324 1.2503 
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1.2503 
= = .2 
1? 4.8765 pas 
1.2503 
b ORE шш m 
214 ООВ 55935 
гы 1.2503 3897 


12 ^ (1.4528)(2.2081) ` 


><! 
0 


27.76 + .5923 X, (Substituting іп 
[10:01а]) 

The reliability and, accordingly, number of deci- 

mal places to be retained, of these correlation 

constants is discussed in this chapter, Sections 

6 and 7. 

The computation shown on Chart X I is straight- 
forward, but it does not provide as adequate 
checks upon numerical accuracy as are desirable. 
Many forms have been devised to make the steps 
involved mechanical and such that nonstatisti- 
cally trained clerks can compute correlational 
constants rapidly and accurately. Some of these 
are especially designed for use with computing 
machines. The form given in Appendix C is an 
example of one that is serviceable either with 
or without a computing machine. It is a little 
longer than some of the forms, but it provides 
more complete checks than usual for summations 
are obtained in two independent ways. The steps 
to be followed will not be explained in detail 
as they are practically self-explanatory, but 
attention is called to the following facts: that 
S = é+ t; that d = 2 = t; that the italic 
entries are the computational or manual entries; 
that "LL-UR diag" indicates that cells in a line 
from the lower-left to the upper-right all have 
the same d-values, so that summations of fre- 
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quencies along such a line give all the frequen- 
cies having the particular d value recorded at 
the end of this line at the bottom or top of the 
Scatter diagram; that cells in a line from the 
upper-left to the lower right, "UL-IR diag," all 
have the same s-values, so that summations along 
such a line give all the frequencies having the 
Particular s-value recorded at the right or 
the left of the scatter diagram; and the value 
printed in the lower right corner of each cell 
is the value of the product £t for that cell. 


SECTION 6. THE REGRESSION EQUATION AND THE 
ANALYSIS OF VARIANCE 


We have noted [10:16] that having two corre- 
lated variables, X, and X,, the total variance 
V, is divisible into two independent parts, one 
being capable of being estimated from X, and the 
other, V, у, being independent of Xi, thus 


LX RV ИИ See[I0:16] 


(1-1) = ЕСЕ (№-2 ) Degrees of freedom equation [10:32] 


The degrees of freedom of these three variances 
аге (1-1), 1, and (1-2) and, as always, a degrees 
of freedom equation corresponding to a variance 
equation can be written. As shown in Chapter VI, 
the number of degrees of freedom of V, is (1-1). 
Clearly, having a given set of X, values the 
variance of X, (identical with the variance of 
Хо) is consequent to one constant, Ьо, and thus 
has one degree of freedom. The residual vari- 
ance, Vy ,, must then have exactly (1-2) degrees 
of freedom. 
Or, we can determine this by noting that the 
: residual deviations, X, ү, are those after two 
linear restrictions (leading to М, and by.) have 
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been placed upon the X, measures. Since M, is 
not known a priori, but found from the sample, 
this constitutes placing the linear restriction, 


3X, FENCE I UC 
upon the X,’s. Since b,, is not known a priori, 
but found from the sample, this constitutes 
placing the added linear restriction, 

ZX,X, =N MoM, +N Vibo, . . . [10:34] 
upon theX,'s. Having given, i.e., fixed, values 
of X,, the summation 2X,X, is of course linear in 
Хо. When fitting parabolic regression lines 
further linear restrictions of the sort, ХХХ, 


ZX,Xj, etc., are involved. 
The sample at hand is to be conceived as one 


of an indefinitely large number, all of which 
have identically the same set of X, values. Thus 
the parent population varies only in having dif- 
ferent X, values from sample to sample. 

The two parameters whose distributions we are 
concerned with, should many samples be taken, 
are M, and b,,, as indicated in [10:33] and 
[10:34]. Let the first hypothesis that we desire 
to test be that M, is a chance deviate from M,, 
and the second that b,, is a chance deviate from 
B We make the first test under the assumption 
that b, = Б, and the second under the assump- 
tion that M, = Mọ. Otherwise expressed, we test 
the hypothesis that M, is a chance deviate from 
й, for the universe of samples having the regres- 
sion b,, and we test the hypothesis that b,, is 


a chance deviate from b,, for the universe of 
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samples having the mean И. Actual equation 
[10:35] and theoretical equation [10:36] are the 
basis for the first test and equations [10:35] 
and [10:37] are the basis for the second. 


№ -№ = Вх, px Mec. [10:35] 


Hypothesis when 


ox № Ts being [10: 36] 


27 
Х -№ = b,x, tx 
0 0 0171 testeo 

^ л Hypothesis when 
= = n is being ч, 
X, -№ = by yx, + x ак НИ [10:37] 


Subtracting [10:36] from [10:35] and transposing 
terms, we obtain 


m ^ 
Ж.1 = (И-И) + хот 


27 Ия 3 
Хо. 13 divided into two independent parts and 
for the first sample we have 

^o 


Уо. = (И-И) + У 


in which (M, 7H, )? has one degree of freedom and 
Vo. i has (№-2), so we obtain the variance ratio 
en P 5 
(1, — My)” (1-2) ‘tneer tet tyre 
== аасы, 


1:01-22:177 =й, when true [10:38] 


ЕЕ regression Eden 


F 


Utilizing [10:35] and [10:37] we similarly obtain 


№ [Gb ibas (21 (N-2) 
1(8-2]7 Е SE (10:39] 
Vo (1-го, ) 
Е to test hypothesis LE regression hu when true mean 
PERO. 
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The qualifications next to formula numbers [10:38] 
and [10:39] "when true regression - b,," and 
"when true mean - M," may ordinarily be con- 
sidered immaterial because in general the corre- 
lation between mean and regression coefficient 
approximates zero. 

The square root of F;,,.,, is "Student's f" 
and if г < .90 and N > 25 (án fact, formost pur- 
poses if М > 12) it may be interpreted аз a de- 
viate in a unit normal distribution. 

The one-third-sigma rule for the number of 
decimal places to keep when publishing, given in 
Chapter VI, Section 7, is based upon the value 
of the standard error of the item in question. 
The square root of a variance ratio 18 а critical 
ratio, so from [10:38] we note that 


V, (17753) л M 
= Variance error of M, deter- [10.40] 
9 N-2 mined from correlated data 


From this we obtain the standard error and apply 
the one-third-sigma rule. For the Galton data, 


1 2.1108(1-. 38977) _ 


^ .00559; oy = .075; 
1 322 1 


С, 
апі 21750102, 
3 


We accordingly publish M, аз 68.31. 
From [10:39] we note that 


exe 
y = y го) Variance error of bjs- [10:41] 
^oi У, (N-2) cf [13:143] 


This, with formulas [6:58] and [6:60] provides 
the basis for expressing published results as 


herewith: 
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И, = 68.31; V, 72.11; о; 7 1.45; = .26 
M,-68.22; У, =4.9; о,=2.21; Ь, = .59; 

апа гі. = „39, 


SECTION 7. THE TRUSTWORTHINESS OF А CORRELATION COEFFICIENT 


The [] term in [10:39] [rj c, -true( горо. )]? 
If, instead of assigning a hypothetical value 
to the product of r,, and со, we assign such a 
value to r,, and consider all samples having the 
standard deviation оу, then the right-hand member 
of [10:39] becomes 


(гоз- T,,)?/(N-2) 
———————— 


AE 
ЕО 


which, at first sight, looks as though it provided 
an Fi(,.5, test for гу, but this is not so be- 


cause, when fixing the value of Со, we have imposed 
the condition ix? = NV, and this is not linear 
in the x,'s. 

However, in the Special case in which the hy- 
pothesis is that dos 7 0, there are such cancel- 
lations that [10:39], or the Similar expression, 
interchanging variables, becomes 


2 
To iN 2) F to test hypothesis 
Рана = that true correlation [10: 42] 
1-г2 =a 
01 


and we have а sound test of the significance of 
Яо. 

We will mention three unequally excellent 
procedures which are available when То: does not 
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equal zero. For the most precise evaluation for 
samples of 25 or smaller we may use David’s 
Tables of the ordinates and probability integral 
of the distribution of r insmall samples (1938). 

A second method, which has valuable combina- 
torial properties in addition to excellent pre- 
cision inthe interpretation of single correlation 
coefficients, is the r-into-z technique of Fisher 
(1920-21).* Using "In" to designate a natural 
logarithm and "log" to designate one to the base 
10, the transformation is 


1 
es [1п(1+г) - In(1-r)] 
Fisher's [10:43] 
E 1 ltr Р оог 
9 п эс transformation 


Since In a = 2.3025851 log a, we may write this 
using logarithms to the base 10: 


z = 1.1512925 [log(l*r) - log(1-r)] 
[10:432] 


и 


1+г 
1.1512925 log; ^t 


Fisher has shown that z has the happy property 
of being very nearly normally distributed with a 
standard deviation which is a function of the 
size of the sample only: 


= а Standard error for all [10:44] 
z values of z Р; 


If we postulate the value T, we apply the r- 
into-z transformation to both г and 7, obtaining 


*Werein is a table of z for values of r,and in Fisher, STATIS- 
TICAL METHODS FOR RESEARCH WORKERS (1925 et seq.) is a table 
of r for values of z. A more detailed table of this relation- 
ship is given in column "u" and "tanh u" of Table TI of SMITH- 
SONIAN MATH. TABLES, Hyperbolic Functions (1911 et seq.) 
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2 and Z. The critical ratio that concerns us is 


би 
275 еў Комар 
= (2-2) vN-3 Normally distributed 
1 С 2 critical ratio to [10:45] 
ЛЕЗ test (r-r) 


The third and older method is to obtain a 
critical ratio by dividing r by its standard er- 
ror. This standard error is frequently taken as 
= (=r?) /VN, but an improved result is gotten if 
М is replaced by the number of degrees of free- 
dom, N-2 (though it is true that one of the two 
degrees of freedom lost does not represent a 
linear restriction), 


2 
= 1-г Standard error of а total 5 61 
TER 7157 correlation coeffictent [10:46] 
N-2 


Many derivations of с, have been made and with 
slightly different outcomes, Perhaps the best 
first approximation is 


l-r? 


Е ОТО :л6а] 
заз4.0Гг 


which is obtained by combining two of Romanowsky’ s 
formulas, * 
The critical ratio that now concerns us is 


69 ~ 
PIC Dum (22:51) УК-2 Non-normally distribu- 
i 5 ted critical ratio to [10:47] 
о, dcm test (г- Г) 


If N is large (sayN>15) ог г small (say г <.9), 


^e Romanowsky, On the moments of Standard deviation and of 
correlation coefficients in Samples from a normal popula- 
ЫН METRON, о 1.5, no.4, 31-Х11, 1925,-formulas [121] апа 
124 
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and particularly if both of these are true, 
[10:47] will give very serviceable results if we 
assume r to benormally distributed, but otherwise 
the lack of knowledge of the form of distribution 
of [10:47] is serious. 

For the Galton data, where variables were 
labeled X, and X,, we can test the hypothesis 
that there is zero correlation between mid-parent 


height by [10:42] 


F Е (23897) 20392986 
1,322 7 oe (.3897)2- 57.67 


This is so large а squared critical ratio that 
there is negligible probability that it could 
have arisenas a matter of chance, thus disproving 
the hypothesis. 

Suppose some theory of genetics asserts that 
the correlation should be 4/9 and we desire to 
test this hypothesis. А test of [.3897 - .4444] 
would not take cognizance of the fact that the 
4/9 isa value assuming no grouping error, where- 
as the .3897 has involved standard deviations in 
which, according to Sheppard's correction for 
coarseness of grouping [6:49], there is an ap- 
preciable error. We could correct the observed 
correlation coefficient for coarseness of group- 
ing and then compute Р, but this would not be 
precise for the entire theoryofF and its distri- 
bution involved no such procedure. We must there- 
fore keep the observed value .3897 exactly as it 
is and alter the theoretical value, 4/9, by an 
amount which compensates for the coarseness of 
grouping. Sheppard’s correction for the variance 
is 


n .2 
1 1 Sheppard's correction to 
V = V -—= V(1-—) V for coarseness of 


g 12 12V grouping [see 6:49] 


in which V is the observed value and i the size 


362 CORRELATION [10:48] 


of the grouping interval of the X’s. If we take 
‘the reducing factor [1-(i?)/(12V)] as a service- 
able estimate of that to apply to the theoretical 
Situation, we have 


EX Nes 21, У 21, 
СЕЕ же услу Миг о 
«Үл V2 v a Ty Р. (1 15 ) 
am Ey? 12V, 


yielding the theoretical r^ value, stepped down 
for coarseness of grouping, 


4 1 
fua csl RECO 1 2 : 
==“ = а 10:48 
Rue uade у). Т | 
1 2 И! 2 


For the Galton data i, = i, = 1.0 (not .5 as was 
the case in the computation involving ё and 0): 
У, -2.1108; V, = 4.8765; and @ Т> = 4/9. We ас- 
cordingly obtain 2,, -.4318, and the difference 
that concerns us is (Е) = 2. 8897 —. 4318). 
Transforming г and T into z and Z, we obtain, 
z — 4114, апа 2 = ‚4620, во that the [10:45] 
critical ratio is (.4114 - .4621)//321, which 
= - .9084. The probability that а positive ог 
negative difference аз great as this would arise 
as a matter of chance if r is a chance deviate 
from r is .36. "Thus the data are in excellent 
conformity with the hypothesis oF (1.е., Y with- 
out a grouping error) - 4/9, 

By [10:47] the critical ratio is -.8907 and 
the corresponding P assuming normality is .37, 


which though in error is not sufficiently so to 
lead to a different practical conclusion. 


мэтэ” 
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SECTION 8. AVERAGING CORRELATION COEFFICIENTS 


To illustrate the combinatorial excellence of 
z from r let us assume that a number of determi- 
nations of the correlation between mid-parent 
height and offspring height were made and that, 
except for the numbers of cases involved, they 
were equally trustworthy. Assume the results to 
be 


ап г of .3897 from a sample of 324 
an r of .3542 from a sample of 156 
ап г of .Ч%21 from a sample of 13 


An r based upon all the data is desired. The 
procedure is totransform each r into a z, weight 
these inversely as their variance errors, and 
average. Thus 


Г = .3897 55 2 = „4ИЩ, which is to be weighted 321 
г = .3542 == z = .3702, which is to be weighted 153 
г = 442| <= 2 = . 4748, which is to be weighted 10 
321 (.4114) + 153 (.3702) + 10(.4748) 
а 


321 + 153 + 10 


.3997 05 г = .3797. 


This mean 2 and equivalent г are as reliable as 
if computed from a sample of 487, which - 321 
БЗ ОТОКА 


SECTION 9. THE TRUSTWORTHINESS OF А 
POINT ON THE REGRESSION LINE 


The regression [10:01] enables us to estimate 

X, knowing X,. In this case the error of esti- 
mate is (X, 2x9) and the variance error is 

Variance of the errors of estimate 


ys НЬ Vo Е: when 15 estimated from X,,--see 
[10:09] and [10:24] 
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The square root of this is the standard error of 
estimate, It is the crucial measure giving the 
precision with which we can estimate one variable 
from a knowledge of another. Since the quanti- 
ties (X,-X,) are "errors" we may regularly assume 
them to be normally distributed with a mean of 
zero and a standard deviation of 


Standard error 


f imat 
бу. YVo.i Eom У1-с2, (See [10:09] and [10: 49] 


0:24] › 


If our concern is not with individual Xo 
Scores, but with the difference between the re- 
gression line found ard the true regression line, 
then we are interested in the variance errors of 
M, and b,, аз already discussed. We may also be 
interested in the trustworthiness of Some speci- 
fic X, corresponding toa specific X,. The vari- 
ance error of this has been given by Pearson 


(1913 Freq.) and, with minor modi fication of 
changing N to 1-2, is 


ИЕ.) (ХИ, )? 
N2 l+ A ] [10:50] 


Variance error of a point on a regression line when 
distributions of arrays are similar 


УХ, = 


We can illustrate the use of this formula by 
citing a problem arising in the "standardization" 
of educational or psychological tests. Let Xi 
be the score on a comprehensive and highly relia- 
ble* psychological or educational test which is 
applicable to a wide range of talent, which we 
will call the "anchor test" and for which adequate 


* If not highly reliable, a modification of the procedure here 


outlined, incorporating the reliability of the x, scores is 
necessary. 
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"norms" exist. Let X, bea score upon a new test 
in the process of being "standardized," or having 
norms established. The correlation between it 
and X, for a sample of N has been obtained. We 
desire to tie the new test to the anchor test 
and secure a set of parallel scores such that we 
can assert Холі, is the nom for those of ability 
Xi,; Ходль for those of ability X,,; etc. Intro- 
ducing this value X,, into [10:01] and noting 
that Ход: =X,, yields Холу, which is the desired 
parallel score, or equivalent score, to Mun 
Substituting Х,, for X, in [10:50] gives the 
variance error of this equivalent score. 
Clearly, if rj, is high and X,, does not dif- 
fer greatly from M}, we obtain X;4,, with high 
precision even though Ї is rather small. This 
is thus a means, at moderate testing cost, of 
establishing the norms of a new test by tying 
them to those of highly reliable previous measure. 


SECTION 10. CORRELATION BETMEEN RANKS 


A simple formula for the correlation between 
ranks is readily derived from the product-moment 
formula by expressing the covariance as a func- 
tion of the variance of the differences of paired 


measures Хү, х. 
Ү(хү-х,) =У,+ У, 72e = ИУ, 7200,71, [10:51] 


Thus 


V, £V, - V(x,-x,) 6 
ее 0582 


2 сі 95 moment f 


Tio 


We record, as being at times convenient, a simi- 
larly derived formula based upon the variance of 
the sum of paired measures; 
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V(x,*x,) -V Hk 


Sum formula [10:53] 
2с, а for product 
le moment Г 


or combining these two 


Их: +x,) -VW(xi -х,) Sum and dif- 
=— ——_. 


ference for- . 
4 со, с mula for 00:54] 
NS product moment r 


Tio 


If, as in the case of ranked data involving no 
missing ranks from 1 to N, we have И = И, for- 
mula [10:52] simplifies. From [14:128] giving 
the sum of № numbers, 1, 2, 3, ... ‚ №, and of 
their squares, we have, if X stands for the suc- 
cessive ranks 1, 2, 3... , N, 

№ ү  MN* 1) 

РЕ 

2 2 2 

во that 


№ + 
И, = 2 : The mean of № ranks [10:55] 


Also 
3 2 
sy № quA = N2N* 1) 1 Jo) 
died 6 6 


зо that 


Mean X? - E [10:56] 


N?-] {> 
ance of W [10:57] 


У, = (mean X°- И?) = 
2 ‹ x 12 ranks 


Let d stand for the differences in paired ranks, 
i.e., for (x,-x,) of formula [10:52] when the 
Scores in question are ranks, and let оу be the 
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product moment correlation between ranks, then, 


utilizing [10:52] and [10:57] 


6 >а? Зреагтап гапк сог- 
Pi; = 1 - ——— relation coeffi- [10:58] 


ММ? E 1) cient 


This formula must not be confused with Spear- 
man’s "foot-rule" rank correlation formula, which, 
though simpler, has such shortcomings as never 
to be preferred to [10:58]. 

The coefficient p,, is exactly а product 
moment correlation coefficient, when scores are 
ranks, and if used in a regression equation in 
which the predictor variable is expressed as a 
rank, it should not be altered in any way. Ina 
theoretical treatment, wherein the assumption is 
made that the real distributions underlying the 
ranks are normal, Pearson (1907) gives a formula 
for estimating from р the г that would have been 
obtained had the continuous normally distributed 
variables been used. It is 


Pearson's correction to 


r=2 sin— p Spearman's р [10:59] 


This correction is small and in general it suf- 
fices to report р and its standard error. The 
standard error of p and of r-from-p have been 
given by Pearson andare, after first substituting 
(N-2) for М, to allow for the loss of two degrees 
of freedom, as herewith 


1222 
c, 7—— (1+.086 о? +.013 p" +.002 07) [10:60] 
-2 Standard error of Spearman's р 
icr 2 4 6 
c, = 1.0472 (1 + .042r* + .008 г +.002 r^) 


р 1-2 
Standard error of Г from p [10: 61] 
If one of the paired measures is a normally 
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distributed variate and the other a series of 
ranks, though the underlying variable is in fact 
normally distributed, and if the product moment 
correlation between them is called р’, Pearson 
(1914) gives the following formula to estimate 
the correlation had the normally distributed 
Scores been available: 


эс FL БОГ Е 10.65] 


Correction in p for ties т ranks. It fre- 
quently occurs that two or more ranks are tied. 
In this case all of the tied ranks are given a 
single value, which is the average of the ranks 
represented by the tied measures. For example, 
the 11 scores 


LOEO TD TE ONES 4; 4 
are given rank values 

VP Sibi D NS ey Т, 1217) 1025; 410,5 
The mean of these is exactly the value given by 
[10:55], but mean x? for these is not identical 
With the answer given by [10:56], and V, is not 
identical with the value given by [10:57]. Horn 
(1942) has noted that the correction in [10:57] 
is simple and has given tables and approximations 
to facilitate computation when there are di f ferent 
numbers of ranks in the ties. 

We note that if a number ofranksX, Х+1, X42, 

+ +. , ХК are replaced by their average, X+. 5k, 
and if their squares X^, (X41)?, . . . (X*k)? are 
replaced by (X*.5k)?, (Х+.5к)?,. . . » (Xt. 5k)?, 
the sum of the latter Squares is less than the 
former by an amount which isequal to К times the 
variance of k ranks, thus 


k(k?-1) 
12 


em. = S X4, 5k)? + 


Х=Х 


k(k?- 
sacat, skyer 0) 


- [10:63] 
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We may rewrite [10:58] thus 


ха? ха? 
т а - 5 TU fe 
PIT -1) NN -1) 2/NV, JAY, 
12 12 


Herein the V's are given by [10:57], but if there 
are tied ranks, МУ, is too large by an amount 
equal to the sum of all the functions k(k?-1)/12 
for all theties in the first series and similarly 
МУ, is too large by the sum of all the functions 
[k(k?-1)/12] for the second series. To illus- 
trate in the case of the 11 ranks just given: 
the sum of the squares of the measures dy 2525-95 
3.5, 5, 7, 7, 7, 9, 10.5, 10.5 is equal to the 
sum of the squares of 1, 2, 3, 4, 5, 6, 1:8. 9% 
10, 11 minus ([2(2? - 1)/12] + [3(3? - 1)/12] + 
[2(2?-1)/12]}, because of ties 3.5, 3.5 and7, 7, 
7 and 10.5, 10.5. The deduction is 3(=.5+2+.5). 
We accordingly, inthis instance, deduct 3 and set 


EDUC ES 


NV, = 3 = 107 
2 12 


Of course, a similar correction would be necessary 
for NV, so that the formula for p becomes 


= 60> d* 
Вт TTE 2 ATP 2 
(V^-1)-Sk. (kT-1) УМ(М 51)-5К,(К5:51) 
p corrected for ties in ranks [10:64] 
Herewith is a short table of k(k*-1) values 
for different numbers of ties in ranks: 
TABLE X C 


ORRECTIONS WHEN COMPUTING p FROM TIED RANKS | 
k: No. of 


ranks tied 8 91 10 
| 336 | 504 |720 | 990 
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SECTION 11. BISERIAL CORRELATION 
A frequent correlation situation is one in 
which oneofthe paired variables is dichotomous. 
Chart XIIillustrates this case. X is the gradu- 


CHART X II 


A BISERIAL SCATTER DIAGRAM 
27 


2nd Category 


К 
d 


Ist Category 


Value of Variate 
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ated variable and Y the two-category variable. 
With such data, in addition to the straightforward 
problems of estimating X knowing Y and of esti- 
mating Y knowing X, we may be interested in es- 
timating the correlation and the regressions 
maintaining between X and Y', a continuous nor- 
mally distributed variable which has been forced 
into the dichotomy given by Y. 

The essential statistics obtained from a 
Scatter diagram are shown in Chart X II. 

The units of measurement of X may well be 
those of the raw scores, grouped in not less than 
12 classes if desired. In the caseofY, simpli- 
city results if the lower score is called 0 and 
the higher score called 1. If the proportion of 
cases receiving а 0 scoreisq, then the constants 
of the Y distribution are: 


ео qe S абы 


Кл a Жїрє рй [10:66] 


For the X series we compute 
Ki K 
The number of cases is N and the covariance is 
N N 
De EXgXO+ XXI 


ЦЭВ By co EN S Uo Tiu 


Pq(M,-Mp) ес. [10:61] 


in which the Ху measures are those paired with 
a zero Y score and the X, measures are those 
paired with a Y score of l. Also M; is the mean 
of the X measures in the lower group and M, the 
mean of the X measures in the upper group. 
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paH, -M ion o wo- e 
pe SEE ШШШ ШИШЕ? 10; 09 
д variable 
URP EI 
277 m0 "Piserlal product moment Г [10:70 
x 
Поа... 10:71 
Cu) Pa (1-8 ра : 
= 7 +Р y x [10: 72 


The variance ratio test of the regression 
coefficient (MH) is given by [10:39], and the 
variance ratio test of И, by [10:38]. 

If N is small, these tests, [10:39] and [10:38], 
for the regression (И,-№ )pq/V, and the mean, 
И (=p), should be applied with misgiving because 
the variances involved derive from two-point 
distributions. 

А test of (M,-Mp ) alternative to [10:39] is 
the critical ratio test (M,-M )/o, EL which 

u 


Squared is, 
(t, =t)? 
AES NEN: (See [6:19]) [10:73] 


+— 
АЕ 


u 


If, in the population У,5 Үр, then approximately 
V, VN 
Е М 
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and the just recorded squared critical ratio be- 
comes 


ЁС * ра(ћ-2) 


Fi n-2 = 
К VACHE 


which is exactly the [10:39] variance ratio test 
of the regression coefficient (1-2) Thus, if 
the data are homoscedastic (equal variability of 
arrays under the definition of equal variability 
of 


N, Np 
er My =P 


and not the older definition, V, = Vp): the 


[10:39] test is adequate, but if V, does not 
approximately equal \ the squared critical ratio 
test [10:73] is the more precise. 

Estimates of biserial relationships, assuming 
the dichotomy is one forced upon an under lying 
normal variate. Under this assumption we can 
compute the correlation and the regression equa- 
tions, but as the assumption itself is always 
open to serious questioning, though how serious 
we cannot express quantitatively or in probability 
terms, a computation of the standard errors of 
these modified correlation and regression con- 
stants is objectionable. It could only lead to 
fiducial results in which the presumably major 
source of error (assumption of normality) was 
neglected while the minor source (small size of 
sample) was treated as though it were the sole 
source of error. 

If the Y measures recordedin the two-category 
manner Һай derived from a normal distribution 
having a mean of zero and a variance of one, the 
means of the lower and upper categories would be 
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-2/4 and z/p respectively, in which 2 is the 
ordinate in a unit normal distribution at that 
point for which the proportion of cases to the 
left is q, and the proportion to the right is p, 
as given by [8:25] and [8:26]. Further, the re- 
gression of X upon this normal variate, which we 
designate y' (the same as x of tables of the unit 
normal distribution), will be a line passing 
through the intersection of д and -z/q and the 
intersection of M, and z/p. In Chart X II, the 
sum of the absolute values of the distances x; 
and X, equals (M, Mp), and the sum of the absolute 
values of the distances у, and у, is, interms of 
the unit normal distribution, 


я 2 м 2 
—+— which = ——, so that 
prd РА 


NC Pq Regression of actual 


, continuous variate upon [10:74] 


EJ z assumed normal variate 
Of course У,, = l, so that 
(H, ) pq 
PX LU Biserial Г [10:75] 
су2 


The designation of this аз "biserial г" follows 
long established practice, but a more accurate 
designation would be "biserial г corrected for an 
enforced dichotomy upon a normal variate." 

The regression equation of X upon y' serves 
no practical need since one does not actually 
have у’. The other regression is 


_ (Mk) ра 


Уу! (QE LUN Жал E [10:76] 


x 


| 


JS AEN РЕ 
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The variance error of estimate of у’ (assuming 
zero error in the assumption of normality of y') 
is 


LEES кл RSS. corn aeo в Я 


The variance error of biserial r, as given by 
Soper (1914), is 


l,pa 3 рут ПИ. 
blser.r ian Frat m un) 


(biser.r)?*(biser.r)') + + [10:78] 


but of course this variance error does not have 
incorporated in it the error because of the 
assumption of normality. For dichotomies wherein 
q is not less than .05, a close approximation to 


[10:78] is 


m 
[ a (biser.r)?]? 
МЗ ааа .. [10:79] 


A common practice in the statistical analysis 
of true-false, or right-wrong, test items has 
been to judge of their excellence by the size of 
the biserial correlations between them and a 
graduated criterion, or by the critical ratios 
biser.r/o,,,,,.,. In view of the fact that the 


continuous variable y' is not actually avail- 
able, nor presumably ever will be available for 
it is not contemplated that the future scoring 
of such an item will ever be other than "right" 
or "wrong", and in view of the shortcomings of 
the formula for the standard error of biserial r, 
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it would seem better to employ biserial product 
moment r and the precise tests connected there- 
with. The hazards of employing biserial r in- 
stead of a real product moment r, such as is bi- 
serial product moment r, are increased if several 
items of a test are combined in a multiple re- 
gression equation to estimate a graduated cri- 
terion, (a) because the best weights, in the 
minimal variance error of estimate sense, derive 
from the product moment r's and cannot be improved 
upon in any manner, except by resort to higher 
order, or curvilinear regression, and(b) because 
the precise tests existing for product moment 
multiple and partial correlation do not apply if 
correlation coefficients other than product moment 
coefficients are employed. The observations in 
Section 12,—tetrachoric r,—upon dichotomies 
and rangeof talent apply in biserial r situations. 

The writer has been kindly supplied with the 
dataof Table X D, which aroseinthe experimental 
work of L. R. Waldron, State College Station, 
Fargo, North Dakota. 


TABLE X D 
CERTAIN BISERIAL DATA 
=0 =| 
X No.of No.of 
cases cases Resulting constants 
10 2 A 
И 57.55 
9, 4 v 
8 15 Mp = 3.05 
7 12 оу = 2.367 
6 8 m^ 
5 3 | Р 3 
4 4 
3 7 2 = 29636 
2 Б Product moment 
| 2 biserial г = .90 


21 42 Biserial r= 1.13 


мэр 
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Неге the assumption of normality in the Y variate 
is so untenable as to lead to an absurd biserial 
r. Should a situation be only a little less ex- 
treme than this, it would not be revealed by 
yielding an impossible г value. The student 
should make the assumption of normality only 
after careful consideration and he should not 
expect ап unsoundness in this respect to be auto- 
matically revealed by absurd statistical outcomes. 

[t seems to the writer fairly satisfactory to 
use biserial r as a terminal statistic in con- 
nection with the data of Table X E drawn from 
psychological examination in the United States 
Army (see Yerkes 1921). 


TABLE X E 
SCHOOL ATTAINMENT AND ARMY ALPHA SCORES 


SCORE IN ARMY ALPHA 
INTELLIGENCE TEST 


NUMBER OF MEN WHO LEFT SCHOOL 


BELOW THE 9TH GRADE | ABOVE THE 8TH GRADE 


205-212 [ 
200-204 3 
195-199 14 
190-194 17 
185-189 49 
180-184 54 
175-179 78 
170-174 126 
165-169 149 
160-164 200 
155-159 244 
150-154 305 
145-149 352 
140-144 338 
135-139 +07 
130-134 507 
125-129 528 
120-124 530 
115-119 643 
110-114 674 
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ТАВЕЕ ХЕ 
(CONTINUED) 
SCHOOL ATTAINMENT AND ARMY ALPHA SCORES 


SCORE IN ARMY ALPHA 
INTELLIGENCE TEST 


R OF MEN WHO LEFT SCHOOL 


NUMB 
BELOW THE 9TH GRADE ABOVE THE 8ТН GRADE 


105-109 682 
100-104 691 
95- 99 712 
90- 94 725 
85- 89 769 
80- 84 593 
75- 79 642 
70- 7% 648 
65- 69 567 
60- 6 581 
55- 59 430 
50- 54 346 
45- 49 305 


40- 


M, = 98.158 ML7 54.987 
с, = 36.606 p= .28735 
z = .34083 


biserial product moment г = .541 


biserial г = .718 


OO ——— E 


"———— у 


ү түт TR 
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Considerations in support of the utility of 
this biserial correlation аге (а) the naive con- 
cept of the correlation between intelligence 
test score and schooling would assuredly involve 
the amount of schooling as a graduated variable, 
(b) the future actual treatment of schooling as 
a graduated variable is entirely possible, (c) 
the distribution of "highest school grade at- 
tended" is probably not radically non-normal, 
(d) the correlation is so high and the sample 
so large that a precise test ofthe significance 
of (r,,,,,, 0) is not needed, and (e) presuma- 
bly this statistic is a terminal statistic and 
will not be incorporated into further algebraic 
treatment. 


SECTION 12. — TWO-BY-TWO-FOLD CORRELATION 


Phi: The issue of continuity which arose in 
connection with the Y variable in the case of 
biserial correlation, here arises in connection 
with both variables. We first ascertain the 
pertinent statistical constants for the actual 
point distributions as given by the data. In 
this case the product moment correlation has 
been known as Yule's ф, and is here designated 
Ф. It Xs, in every detail, a rigorous product 
moment correlation coefficient, and the use of 
ф instead of г to designate it merely calls at- 
tention to the fact that it is computed from a 
two-by-two-fold. Its computation is simple and 
its use calls only for a certain reservation in 
the case of variance ratio or chi-square tests 
applied to very small samples. 

It has been ‘customary to plot two-by-two-fold 
scatter diagrams as in Charts X III and X IV, 
so that the proportions Р and p' are each equal 
to or greater than .5, even though this involves 
divergence from the usual scaling of abscissa 
from left to right and of ordinate from bottom 
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to top. The X and Y scales are indicated in 


Chart X IV. 


CHART X III 
TWO-BY-TWO-FOLD SCATTER DIAGRAM 
Х 
| « fia | eq o 
а+с btd N 
Ү 1 0 
СНАВТ Х ТУ 


TWO-BY-TWO-FOLD SCATTER DIAGRAM 
BASED UPON PROPORTIONS 


X 
Ree Qe 
БОР Л nS Ме, 
p' q' 1 
Үнэл 0 


а = a/N; B = B/N; у = c/N; 8 = d/N; р = (atb)/N; 
р" = (atc)/N. It is readily established that 
И, = РИ, = ра иро = р'9'; and the co- 
variance 


= аё-Ву [10:80] 


— n а. 
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Суу _ ad-be а8-бу Product moment 


ф= Г та two-by-[10:81] 
o.c, оо [oq p'q' two-fold 
. а5-Ву € r 

p = mpm And similarly for Dre 10:82) 


х= by Y rp Буур! And similarly for Y [10:83] 


-p)? N-2 ariance ratio to 
MR ee (N-2) Variance ratio t 10:84] 


ра (1-¢7 ) test (Р-Р) 


This may be compared with the critical ratio 
test for the mean, available when no second 
variable Y is utilized. The error variance,— 
ра(1-Ф2), тау be looked upon as the sum of (1-2) 
independent variables arising from two-point dis- 
tributions. The tabled distributions of F, or 
of Student’s Ё?, assume that these independent 
variances arise from normal distributions. For 
the usual situations in which М is, say, greater 
than 15, highly serviceable answers result from 
employing the usual P from F. 


(b -b js p'q'(N-2) ariance 
ку; = tatio to [10:85] 
ра (1-9) test (b, b) 


The Р in [10:84] and the b,, in [10:85] are а 
priori values, or points of reference in which 
interested. 

For samples of small size precise tests of 
significance, as illustrated in Chapter IX, Sec- 
tion 2, are preferable to tests based upon [10:84] 
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The variance error of ф, as derived by Yule 
(see Pearson and Heron, 1913) is 


T 2 295 
Vs DRE + (ф eu. EERE B 


qi 
(eo Pn one tron [10:86] 
Pq а two-by-two- 
р ааа агер") fold table 
Tetrachoric correlation: This correlation 
coefficient is an estimate of the correlation to 
be expected in a parent normal bivariate distri- 
bution which, when forced into a two-by-two-fold, 
yielded the observed distribution. If an ap- 
proximately normal bivariate parent may reasona- 
bly be assumed to exist, if the issue is theore- 
tical or one concerning this parent and not a 
predictive problem which must use the available 
two-category measures, if a precise test of sig- 
nificance is not necessary because the sample is 
large and the magnitude of the correlation rather 
than its existence is the matter of importance, 
and if the correlation coefficient obtained is 
not to be used in further connections wherein a 
test of significance will be needed, then tetra- 
choric r will be a useful statistic. The writer 
has noted the frequent use of tetrachoric r in 
multiple regression equations in psychological 
research, but as these almost always lead to 
issues requiring tests of significance this prac- 
tice has serious shortcomings 
Pearson (1900) derived equation [10:87]in 
which r, is the tetrachoric correlation coeffi- 
cient, а, pj and 8 are as indicated in Chart 
X IV, x and z are deviate and ordinate in a unit 
normal distribution at the point of dichotomy 
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of the X variable and x' and 2’ the deviate and 
ordinate in a unit normal distribution at the 
point of dichotomy of the Y variable. 


8 -аа' Ee г? 
zon ax! a x21) (а 520) 
zz! 2! 3! 
4 
+ 
+ (*3-3х)(х'3-3х')— 
4! 
гї 
+ (х*-6х?+3 )(х'*-6х'?+3)— 
5! 
г; 
+ (х?-10х5 +15х)(х'? -10x '? +15х' )— 
6! 


г! 


+ (5-158 MS -15Xx'5-18x* 452-18) `+ 
1! 


Equation giving г,, the tetrachoric coafficient 


of correlation [10:87] 


То express the law governing successive coef- 
ficients of powers of r,, let v,W,/n be the 
coefficient of гү, v, be a function of x, and w, 
a function of x'; then v, may be expressed in 


terms of v's of a lower order: 


ИЕ (n-1)v,-, 


and similarly 


w, = x'w,- 7 (7, 
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and similarly 


САБ ТИЕ 


Thus the equation as written to the ri term may 
be continued to any number of additional terms 
desired should it not converge rapidly enough to 
make terms above the r! negligible. 

Extensive tables to facilitate the computation 
of r have been computed by P. F. Everitt, Alice 
Lee, Margaret Moul, Ethel M. Elderton, A. E. R. 
Church, E. C. Fieller, and J. Pretorius. These 
are given in Pearson's Tables (1914). 

The most extensive and useful of these tables 
give the proportionate area for different dicho- 
tomies in both X and Y maintaining for correla- 
tions -.95, -.90, -.85, etc., to .95, 1.00. The 
interval in x and x' (designated h and k in the 
tables) is .1о. Thus, having any x, х', andô, 
one can find r, by interpolation between two 
successive tables. The tables have many other 
uses, including the determining of the propor- 
tionate frequency in a normal bivariate distri- 
bution in any desired rectangular interval (the 
sides of the rectangle being parallel to the 
axes) for any of the correlations -.95, -.90, 

7 1:00, 

Extensive diagrams for the computation of г, 
have been published by Chesire, Saffir, and 
Thurstone (1933) 

If an assumption of normality for one only of 
the variables seems reasonable,—the other vari- 
able being a genuine dichotomy,—the problem 
reduces to that of biserial r, the graduated 
variable now having but two values. These re- 
quire no correction for grouping for it is not 
postulated that any continuum underlies this 
variable. We will use the data of Table X F, also 


—————— —À——————— M 
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drawn from Psychological Examining in the United 
States Army, to illustrate $, biserial r from a 
two-by-two-fold, and r,. 


TABLE X F 


ARMY ALPHA SCORES OF LIEUTENANTS IN 
MEDICAL AND NON-MEDICAL DEPARTMENTS 


gence alpha test 
Г AorB | Belo || 
Departments other 
Г22:1:22::3 [тә [seo qan 


TABLE X G 


FREQUENCIES OF TABLE X F EXPRESSED AS PROPORTIONS 
58527p; .215215 =x; - 389809 =z 


First 
Lieutenants 


e 


.148= 9 


48277 =р' .1773 4! 


.925704 =x’ 
259915 52" 


Substituting the values given in Table X G for 
5, x, x', z, and z' in equation [10:87] and 
solving for r, by the method of successive ap- 
proximations given in Chapter XIV, Section 7, 
yields г, = .277. Computation of biserial r, 
assuming a normal distribution to underlie the 
intelligence test variable, yields r,:, = .195. 
Computation of the product moment biserial cor- 
relation yields ф = .154. р 

То predict the department, knowing the dichot- 


omous army alpha score, calls for the use of Фф. 
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The correlation is positive when non-medical de- 
partments are at the high end of the scale and 
the medical department at the low end. То answer 
the theoretical issue as to the indicated corre- 
lation if adequate X and Y variables were availa- 
ble, neither ¢ nor гугу is very satisfactory. 
г.у accepts the point distribution of the "Medi- 
cal department-other departments" variable as 
reasonable, and this is certainly very question- 
able. Ф accepts the point nature of this varia- 
ble and also of the intelligence variable, so 
that it is doubly open to question. The writer 
is prone to put more confidence in r, as de- 
scriptive of the underlying relationship than in 
the other two measures, but cannot defend this 
statistically and would hesitate to attach a 
standard error to the obtained value .277. (The 
с, given by [10:88] is .016. 

*If the assumption of normality for both under- 
lying distributions is unquestionable, the follow- 
ing formula, due to Pearson (1913 Prob.), gives 
an approximate answer for the variance error of 
tetrachoric r: 


LU 4 гуд» 
1629) он аа [10:88] 


Аз we may, in general, believe that [10:88] 
understates the variance error, the result given 
by it may be taken as a lower estimate and when 
So taken we find that tetrachoric r is very un- 
reliable when dichotomies are extreme. 

If the two-by-two-fold is such that p=q=p'=q' 
7.5, then [10:87] simplifies to 


Tetrachoric correlation 
Te = соз (278) when dichotomIc lines [10:89] 


are the medlans 
This is a useful formula to use in the prelimin- 
ary investigation of paired graduated measures. 
Calling cases above the median *, for each vari- 
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able, andcases below the median -, then a simple 
count gives the number of pairings of unlike 
sign. This number divided by N equals 28, which 
is U, the proportion of unlike sign pairs, of 
formula [10:89a]. 


Unlike signs formula 


г, = cos(mU) for Г, [10:89а] 


We may use for the variance error in this сазе 


ire s 
Va 1 Tt Variance error of Гү 
23 N 27 aß when dichotomic lines (10:90] 


are at the medians 


In standardized test construction it is com- 
mon: (a) to use a preliminary form of a test 
having more items (scored right or wrong) than 
will be retained in the final form; (b) to give 
this preliminary form to an experimental group 
for the members of which there can be gotten a 
dichotomous criterion score (if the criterion 
score is graduated, it is frequently dichotomized 
at a point at or near the median); and (c) to 
appraise the merits of the separate items by 
their correlations with the criterion. Yule’s ф 
being a product moment correlation is, in the 
least squared error sense, the best measure of 
excellence and any other measure, such as is r,, 
is a less accurate statement of the relative ex- 
cellence of the item in its power to predict the 
dichotomous criterion score. However, r, has 
been widely used in this situation and though it 
has been, inthe writer's opinion, generally mis- 
used there nevertheless is a real argument in 
its favor when the experimental group is more 
homogeneous than the future population to which 
it is anticipated the test will be given. For’ 
example, consider an Air Corps flying aptitude 
test standardized upon the basis of Air Corps 
Cadets, but used later upon the more hetero- 
geneous group of Air Corps cadet candidates. Ап 
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easy item upon which, say, 90 per cent of cadets 
and 50 per cent of candidates pass, yields in the 
experimental group a considerably higher r, than 
Ф. The ¢ reveals the excellence of the item for 
such samples as the experimental group, whereas 
г, will in general be a better estimate than ф 
of the excellence of the item for candidate 
groups. The general policy of using an item or 
a test for a different range of talent than that 
for which there is validation evidence can be 
cogently criticized, but such practice is at 
times unavoidable and logical considerations may 
Suggest that r,, which is an estimate of a situ- 
ation other than that observed, may have prac- 
tical merit, in spite of many shortcomings, in- 
cluding the lack of any precise tests of signifi- 
cance. This latter lack is much more serious 
in factor analysis studies than in item analyses, 
So the use of r,'s in factor analysis seems to 
the writer to be inexpedient. 


SECTION 13. ADJUSTMENTS FOR COARSENESS OF GROUPING 


The problems that arise generally concern 
theoretical issues of correlation, but practical 
problems are occasionally present. 

We will call the measure being predicted the 
criterion, or the predictand, and the other mea- 
sure the predictor or independent variable. In 
connection with the following regression equation, 
У is the prediction, У the predictand, and X the 
predictor: 


ЕА 
Obviously if X із now, and will continue in the 
future to be, available in a small number of 
classes only, there із по point in estimating the 
correlation, or the regression, that would main- 
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tain were X a finely graduated measure. In this 
case no correction for grouping, аз regards X, is 
serviceable. 

А similar argument does not always apply to 
the predictand. Suppose the criterion to be a 
two-category measure, "superior-inferior," of 
medical efficiency. Though only two classes are 
available in the experimental study determining 
the relationship, one would ordinarily be more 
concerned with predicting degrees of efficiency 
than with predicting а two-category status. Thus 
for the Y variable we may make the most reasona- 
ble correction for grouping that is possible. 
This correction, in the case of a presumably 
normal distribution underlying adichotomous cri- 
terion, is given by biserial r and the attendant 
regression equation. Even so, the procedure has 
the serious drawback that precise tests of sig- 
nificance and precise knowledge of the standard 
errorofestimate are not available. However, if 
Y' is the continuous variable underlying the di- 
chotomous Y measures, the prediction equation is 
given by [10:76] and the best, though still 
questionable, estimate of the variance error of 
prediction is given by [10:78]. 

If the number of classes in the criterion is 
greater than ll, we seldom need to be concerned 
with a correction for grouping. 

If the number of classes inthe criterion lies 
between 5 and 12, we have available one of two 
corrections depending upon whether the variable 


is of class indexes or of class means. 
Corrections when class indexes are the vari- 


able: We will illustrate this problem by the 
scatter diagrams of Table X H and X I. The pro- 
portionate frequencies listed in the cells are 
those ofa normal bivariate distribution with the 
coarse groupings given. In each instance the 
correlation for the non-grouped data would be 
.80, and the variance would be 1 
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TABLE X H 


NORMAL BIVARIATE DISTRIBUTION IN WHICH THE 
CORRELATION 1$ .80, IN CASE OF COARSE GROUPING 


E 


400627 1| . 079945 
„000017. 008846 «079945| .215708 


000719] .027520| . 127226] . 079945 


+ 003306}. 026076] .027520] . 003 646 
4002168 


«127226 


. 079 945 


„006271 | .000049 


«000049 


INDEXES 
CLASS 
MEANS 

2.8226 


TABLE X I 


NORMAL BIVARIATE DISTRIBUTION IN WHICH THE 
CORRELATION 1S .80, IN CASE OF VERY COARSE GROUPING 


.000056 060963 +097637 


«060963 560760 «060963 
„097637 «060963 «000056 
2 


-2. 


.0 5 CLASS 


INDEXES 


71.5251 1.5251 СіА55 
МЕАН5 


Пороги 
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First consider the case in which the available 
variable is the class index. We will designate 
the variables given by the class indexes, y, 
and хү, and when a statistic is corrected for 
coarseness of grouping we so indicate by the 
preceding subscript c, the unavailable continuous 
variables by Y and X, true statistics, (that is, 
of these unavailable variables) by a tilde, and 
when a statistic is corrected for coarseness of 
grouping we so indicate by the preceding sub- 
script c. The true statistics for both Table X H 
and X I are: 


Table X H ( in which there is a slight error 
because the distributions have been assumed to 
terminate at 13.5c yields: 


х 


Y - 1.08001; 14 = 1.08001; 


буй эй . 79728; Tey, . 13822 
We note that there is negligible error in the 
covariance, but decided error in the variance 
and the correlation coefficient. In the case of 
a variable, such as a normal variable, having 
high order contact at both lower and upper ends, 
Sheppard's correction for the variance [6:49] is 
highly serviceable. Applying it we obtain, 
ИЕ .99668; ЗА = .99668; Саруу = .79994 
Let us conclude that, having high order contact, 
Sheppard's correction to the variance will yield 
excellent approximations to true variances and 
correlations when the number of classes in each 
variable is six or greater. 
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№ such excellent results are obtained with 
Table X I, where there are but three classes in 
each variable. We find (with a computational 
error because of terminating distributions at 
+30) that: 


Vy, =1.26924; V, =1.26924; c. , =.78065 

T 1 т 

Тху =. 61505; Vy, 593591: e p 93591 
«Туу = .83411 


These results suggest that there is only a small 
"error in the covariance with very coarse grouping, 
but that the error invariance and correlation is 
Serious in this instance and is not adequately 
allowed for by Sheppard's correction. 

Corrections when class means are the variable: 

The class means given in Tables X H and X I are 
computed by formula [8:27]. Let these, which we 
will indicate by the subscript m, constitute the 
variables, and let the preceding subscript c' 
indicate a correction for coarseness of grouping. 
Computation from Table X H yields: 
V, 7.92261; V, 7.92261; c, , =.68662; I 7.14417 

m m mm m 


y y ny 


From Table X I we obtain, 


V, 7.13807; V, = .73807; c, 
m m m 


; =,45395; r, , =.61505 


y y 


Clearly corrections to variance, covariance, and 
correlation are demanded if the problem is such 
as to make corrected results usable and inter- 
pretable. 

Pearson (1913 Inf.), has shown that a correc- 
tion based upon the correlation between the contin- 
uous variable, Х, andits class means, X, (and simi- 
larly for the second variable) is useful in cor- 
recting variance, covariance, and correlation. 
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This correlation, as proven by Kelley (1923) is 


ERA Correlation between a varlate 
re = and the means of the classes [10:91] 
m су Into which It Is recorded 
x . 
Thus 
y; 
MAN Ур ль eee o BIG 
m „2 
r^, 


m 
This formula provides a "perfect" correction for 
the variance. It, in fact begs the question for 
all it asserts is that if one knows the true 
variance the correction to the class means vari- 
ance is such as to produce the true variance. 
Also, as proven by Kelley (1923), 


r r о» Coarseness of 
ЫГ 323 grouping сог- 


= XmYm — х 
БР у =—2 и rection to r(10:93] 
ът Гу, 27) с, cy on account of 
ы ы "oU class moans 
If oy = оу = 1, as when a unit normal distri- 


bution is assumed, [10:93] becomes 


ху, 
г 8. eren 00:94) 
"ху N V, V, 


The corrected covariance is, 


18 y VES Coarseness of grouping 
fis a ut correction to covaria- [10:95] 
Сас ЖЕ. ис c nce on account of class 
xa ys means 


Applying these corrections to the data of 
Table X H, we obtain: 
RA = 1.0; ers - 1.0; e €x, y, 7: 80654; o £y, y 7480654 
Applying these corrections to the data of 
Table X I we obtain: 
n = 1,0; у 02:100; с", y 7.839332; e y, 7.83332 


m m 
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Thus again we find correction for coarseness of 
grouping quite adequate in case we have seven 
classes, but not in case of three classes. 

We conclude that whether we employ class in- 
dexes or class means we have no clearly estab- 
lished procedure for corrections to variance, 
covariance, correlations, and reéressions,. for 
coarseness of grouping if the number of classes 
is 3, 4, or 5. However, even with 3, 4, or 5 
classes the correction formulas just given yield 
Statistics which are closer to the true values 
than are the raw statistics. 

Let us now classify the situations in which 
corrections for coarseness of grouping may be 
employed. If the problem is one of actual pre- 
diction and the predictor is only available, and 
in the future will be only available, in the 
coarse grouping no correction for coarseness of 
grouping in it should bemade. If the predictand 
is only available in the coarse grouping and no 
utility attaches to a finer grouping, no correc- 
tion for coarseness of grouping should be made 
in this variable. If the predictand, though only 
available in the coarse grouping, is the phenom- 
enological expression of a noumenal continuum, 
knowledge of which would be of greater value 
than of the actual phenomena themselves, then a 
correction of the predictand for coarseness of 
grouping may be desirable. It is desirable if 
the relationship is substantial and the sample 
large so that no test of significance is needed, 
but itis not desirable if some sufficiently moot 
null hypothesis is postulated that a precise 
test is important. Finally, if both predictand 
and predictor are limited expressions of contin- 
uous variables, knowledge of the relationship 
between which answers an important theoretical 
issue, then corrections for coarseness of grouping 
in both variables is advisable, provided a need 
for a null hypothesis test is not primal. 


|pD—————ÁÁÉÍ 
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Further corrections, especially for attenua- 
tion, for range, and for nonlinearity are con- 
sidered in later chapters. 


SECTION 14. CORRELATIONS AND OTHER STATISTICS 
OF SUMS AND DIFFERENCES 


If 
X, = WiXa + WW, + essct WX, [10:962] 
: „ A weighted sum 
we may deal with deviations from means and write 


х, = Wy Х + WX, +... + WX, 


k 
letting S stand for a summation of k terms as i 
varies from 1 to К and letting 5 stand for 
a summation of К(К-1) terms (i.e., the number of 
permutations of k things two at a time), as i and 
j vary from 1 to К under the restriction that 
ifj it is readily derived that 
k kk- 1) 

V,=Swiv, +5 КАЙМАКТА АЕ ЕЛИНДЕ 
For a second weighted sum we use distinctive 
subscripts as follows: 
= wX, + w,X, +... + X, [10:96Ь] 


Xs 


The covariance between X, and X5 is 


= мы © iie, LO DIOr9B] 


5 a tabou 


aß 


The correlation between these weighted sums 13: 


с Correlation between a 
r = ав first and а second weighted [10:99] 
ap 227 зит 


If all the measures which combined yield X, 
are similar, as for example, would be the case 
if they are similar forms of some test, and if 
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the same holds for the measures which are combined 
to yield X5 we have approximately 


91 с, T s.e ЫГ 


Pil a ээж Fis «ээ Ti loa 
у: оу ... о, 
DEDI ... Цэ etes 


We use гү to designate the average of the k(k-1)/2 
correlations Гүр» r;,, etc., We use Гон to desig- 
nate the average of the t(t-1)/2 correlations 
Tav? Fac» etc., and we use гү, to designate the 
average of the kt correlations r,,, г 
We immediately obtain, 


Pere PNG 


Correlation between sums, 
I Sach consisting of equally 
kt f welghted measures 
D Dur e 00: 100] 


Vk+k(k-1 рет үе ЭЕ: 


As an approximation to [10:100] we may write 
[10: 101] 


Approximate correlation be- 
tween sums when the measures 
entering each are similar, 


his mue 
DS equally welghted, [10:101] 


1), Vttt(t-l)r,, iy varia 
e 


Precise equality of standard deviations may be 
brought about by normalizing measures of a series, 
or by ranking them, for then each variable will 
have rank scores from 1 to N. 

Let us consider the case in which x, consists 
of a single measure which we will now call Хо 
and in which x, “х, + x, +... + X,, each of 
Хү, Хү, +. Хү Being a series of ranks from 1 to 
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№. Then we can readily compute V, from the data 
and obtain ру; from [10:97] which in this case 


may be written 
è N 2 
У, * V, НКУ, |] = Ур, j [10:102] 


Yielding the average rank intercorrelation 
from the variance of a sum of ranks 


Further 


- К oy Pot 9 Цаа! ees, (10: 103] 
ge. о : 
2 c 12 Ta 


a 


Yielding the average correlation of the separate parts of 
a sum lof measures which are ranks) with the criterion 
from the correlation of the sum with the criterion 


Working with raw ranks we have 
The variance of № 


> ХЕ К N-1 measures eac 
ee не ИЙ ОД] 


Tie being the sum of 
8 N 2 К rank scores 


Combining [10:102] and [10:104] we obtain 


12 2k (2131) I2 XS 
pij S—— ee [10:105] 
(k-1)(N-1) — k(k-1)N(V?-1) 


Average intercorrelation between К series of N ranks each 


The variance error of £1 is 


2 J^v(V, — 110:1054) 


162 Ее. 
(N?-1)k(k-1) 
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The following problem illustrates the use of 
this formula. Six judges, К, T, U, B, L, H rank 
according to merit twelve answers to a given 
problem as follows: 


TABLE X J 


Ч љо о о — + -Own doo 


с› Кә фр оз © о о гю — = м 


= © зч + о —— бм со 


Пт >< ©. — © сз -по гоз о со >» 


— © «о зз о го л = о — о № 


€9 Ф € юкә У о t5 — 


k - 6, N = 12, У X? = 20,500. 50 
therefore, by [10:104], 01} = 324]. 
Ме note that if k and t each approach infinity 


Гав is simply а coefficient Corrected for attenu- 
ation and [10:100] becomes [11:25]. 


If the weighted sum [10:962] is simply X, -X,, 
which we may call X,, thus 


X, =X, - x, 
Then [10:97] reduces to 
НИ НА =дс зү И, 72a95 гу, [10: 106] 
The variance of a difference 


which is a formula of wide utility. 


CHAPTER XI 
FURTHER CORRELATION ISSUES 


SECTION 1. THE VARIOUS CONSEQUENCES OF EMPLOYING 
SEMI-RELIABLE INITIAL MEASURES 


Classic sampling theory has looked upon the № 
scores, X,, ina sample as randomly selected from 
a parent population and has been concerned with 
the problem of estimating the characteristics of 
the population from the information revealed by 
the sample. The observed record or score ofa 
single individual, or case, а, is considered to 
be ah utterly trustworthy record. The concept 
of error attaches to X,, only when X,, is taken 
as an estimate of a population statistic, say 
of #,. Also a square difference СЕ) ов 
exactly the squared difference of the difference 
between the scores of individuals a and b and as 
such has no error. Only when this squared dif- 
ference is taken as evidence of the population 
variance of differences does the concept of er- 
ror attach to it. 

The concept of reliability of original meas- 
ures is utterly other than this. The number of 
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kernels on an ear of corn from a certain stalk 
is not a perfect measure of the fruitfulness per 
ear of the stalk, if it has a second ear, be- 
cause the number of kernels of this second ear 
may differ. The price of a wheat product at a 
certain date, relative to the price at a basic 
date, is not a perfect wheat index, for a dif- 
ferent wheat product may yield a different index. 
The score of a pupil on a reading test is not a 
perfect measure of his reading ability for a 
second reading test given under similar condi- 
tions may reveal a different relative standing. 
The concept "reliability," or "reliability of 
measurement," concerns the phenomena of differ- 
ences in the measurements obtained when the in- 
dividual is doubly or multiply measured by means 
of instruments believed and intended to tap the 
same function. This concept has proven very 
useful in psychological and educational fields, 
where it has been widely employed, and we may 
anticipate a similar utility if widely used in 
other social fields and in the biological and 
physical sciences. Їп the physical sciences the 
concept of unreliability of original observations 
has a long and honorable history, being subsumed 
under the topic, "errors of observation," but 
the correlational consequences of these errors 
have not been pursued in this field to the extent 
that they have been in the field of mental meas- 
urement, 

The concept only arises when two or more sup- 
posedly similar measures of the same thing are 
found to differ. Consider the case of an econo- 
mist who has selected 40 products to incorpo- 
rate into a cost of living index. These 40 
items have been conceived by him as in some es- 
sential sense being expressions of a single phe- 
nomenon,— cost of living. His judgment has been 
employed in their selection and weighting, if 
differentially weighted. In combining the items 
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into a single final score, he has either thought 
that the record of each item was an expression 
of this common function-plus-chance, or of this 
common function-plus-a-non-chance-unique-func- 
tion-plus-chance. Thus for the recorded score 
on item i of the index we have, 


ху хх + ет, An жс 
X, = the cost of living function 
X, = a non-chance, non-cost-of-living, func- 


tion unique to item i, that is, x, is 
not found (except as a matter of chance) 
in anyof the other 39 itemsofthe index. 


! 


an unaccountable, chance, or error in- 
fluence that has affected x,. It will 
correlate with no other function (except 
as a matter of chance). 


T 


The unique function, being unique, has the same 
correlational properties as the chance factor, 
So we combine them and designate the combination 
ер. In brief, x, is a real, ог true, measure of 
the cost of living of which x, is a fallible 
measure, and we write 
жү сх, de 

the c, being a constant depending upon the раг- 
ticular units of measurement which have been em- 
ployed. For a second item we have 


Го 1 
апа в1т11аг1у for further items. Апу linear 
weighted combination of these 40 items can be 


written 


х= C Ее, 


А fallible measure as a 
мыс function of а true meas- [11:01] 
ure plus a factor oper- 
ating like a chance fac- 
tor 


ху 
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For some purposes we may find it convenient to 
write [11:01] thus 


ох ие NE. . [11:02] 
which differs from [11:01] only in the units of 
measurement and not in the inherent relationship 
portrayed. 

The expression for x, asserts that the econo- 
mist conceives of his index as measuring nothing 
but cost of living plus irrelevant or chance 
factors. This attitude toward the total score, 
Хү, extends to the separate items, хү, which have 
entered into it. He can therefore split the to- 
tal score into halves. It is customary to repre- 
sent the half scores by x, and x, , but to sim- 


2 
plify subscript notation we Rall Here call the 
half scores x, and x,. 


XQ = x,-.9x, te, 
2 Scores on the [11:03] 
halves of Хү 
Aic cur CIE Syne 
IT 


In these equations the error factors are uncor- 
related, being either strictly chance factors or 
having unique elements which behave like chance 
factors. Of course X, is uncorrelated with 
either chance factor. For the total measure we 
have 


D cot Nu кые ы EL. .[11:01а] 


If an x, measure is involved, we shall designate 


XC X Ax эсэх ац сг на нг LLEOL] 
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Also we shall designate a criterion measure 
хол Жа Е le Ay] 


Let us estimate as fully аз possible the sta- 
tistics of x, from a knowledge of x], x}, and х,. 
The variance of the differences between the half 
Scores is 


V(x,-x,) = у, + V, -26,, S + У. 


[11:05] 


which is a function of the errors of measurement 
only. This gives us Rulon's (1939) formula for 
the standard error of measurement: 


Standard error of estimate 
т, о; when non-regressed Ху is [11:06] 


Аз 23 E taken as evidence of x, 

The notation c,., in this connection requires 
explanation. If one investigates the scatter 
diagram whose dimensions are x, and x,, he can 
readily prove that the standard deviation of 
arrays of x, for fixed values of x,, the standard 
notation for which is су о, 18 exactly „|, 
[11:05] and [11:06]. Accordingly, c,,, may 
correctly be used to indicate the standard error 
when x, is estimated from a knowledge of x, (a 
process actually never performed), or when x, is 
estimated by using non-regressed ху. 

The customary way to split a measure composed 
of many items into halves is to call the odd- 
numbered items one half and the even-numbered 
items the other half. This is frequently entirely 
satisfactory, but thought should be given to the 
matter, for if a different splitting yields 
halves which, in the judgment of the experimenter 
are more nearly comparable, then this different 
splitting should be employed. Surely in a cost- 
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of-living index all the semi-luxury items should 
not be assigned to the same half score, just as 
in a mental test all the hard items should not 
be concentrated in a single half. А priori con- 
Siderations may suggest that some items have 
larger relative chance factors than others, and 
if so they should be judiciously distributed be- 
tween the halves, etc. The writer (1942) would 
emphasize that there should be full utilization 
of and dependence upon judgment in splitting а 
measure into halves, just as there has been such 
dependence upon it when drawing up the instru- 
ment in the first instance. It is in fact de- 
sirable that the process of making the instru- 
ment be one of making comparable halves. 

The suggestion has been made that a measure 
of 2t items can be split into halves in С2" ways 
and that the best procedure would be to try out 
all these ways (when computing reliability coef- 
ficients) and find the average result.  Theore- 
tically this would seem to the writer sound only 
in the rare instance in which the devisor con- 
Sidered all items (a) equally excellent, (b) 
equally subject to chance, and(c) all functioning 
at the same level. Kuder and Richardson (1937, 
1939) have devised formulas for the computation 
of reliability coefficients when these condi- 
tions hold. Their more elaborate and precise 
formulas scarcely constitute an economy of labor 
over the formulas for the reliability coeffi- 
cient given in this section, though it is true 
that they void the perplexing question of how to 
split a test into halves. Their simplest formula 
for a test (K-R write in terms of mental test 
problems) the items of which are scored "right, " 
or "wrong," is 

cb Vi-tpq 
zb 2 шоп 


Kuder-Richardson short formula for the reliability coefficient 
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in which t is the number of items in the test, 
pi FAN the mean item success score and а = l-p. 
When the special conditions (a), (b), and (c) 
maintain, this reliability coefficient is the 
same as that given by [11:10], and it, rather 
surprisingly, approximates this answer for quite 
a range of common situations. 

The standard error of estimate, given by 
Rulon's formula, is the statistic needed in order 
to attach the proper degree of confidence to an 
original measure. The numerical steps are 
straight-forward and also contribute to further 
statistics which are commonly needed. ^ Simply 
record in four columns the values of ху, Хуу 
(х,-х„), апа х, (7x, *x, ), and compute the vari- 
ance of each column. In addition to the essential 
variance error of estimate, V(x, x, ) 5 we can now 
readily obtainC. S. Spearman’s reliability соеЁ- 
ficient, which is the correlation between simi- 
lar measures. Referring to the sum and differ- 
ence formula for correlation we have 


V;-V(x,-x, ) The half-measure 
= == reliability via . 
6 A Goa: differences [11:08] 
2 Sp eto, between half scores 


We desire гү, or as frequently designated гү, 
the correlation between x, and a postulated simi- 
lar measure Хү. We shall obtain this by first 
deriving the Spearman-Brown step-up formula 
giving the reliability of a measure n times as 
long as the measure whose reliability has been 
experimentally determined. Usually one has соп- 
puted r,, and desires гү, in which case n=2, for 
x, is the sum of the two similar measures x, and 
ху, the reliability of which is the computed 
value r,,- pet 


5 3 + Хх... ИИ 
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be a measure whichisthe sum of n similar parts, 
and let x, be a measure similar to x,: 


+х,+... + Xonar 


х, = XIII Y 


All parts of x, and of X, are similar. 


22 35 ХОРОЛ 
к pm 


п nN 2 53 
пу, +(п —п)с; ; ln 1)г. 5 
Spearman=Brown step-up formula 


Solving [11:09] for г,, we obtain the step down 
formula [11:09a] 


ч 
Ш 


[11:09] 


т 


Earn ZU TET [11:092] 


Solving [11:09] for n we obtain [11:09] 


$ т. (ist) 
n= [11:09Ь] 


т CIEE) 
giving n when г 
е B 
Г, is desired. 
derived is 


35 13 known and some reliability 
The variance error of n thus 


For the step-up from the 


half measure reliability, 
[11:09] becomes, 
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221 2г;, 
Ее [11:10] 
" : 
nos ltr, 


The variance error of r, is consequent to the 
error in r,, and by the method of Chapter XIII, 
Section 8, we obtain Shen's (1924) formula for 
the standard error of r,: 


n Car The standard error of 
a stepped-up reliabil- 
ity coefficient (11: 11] 


== 
^ [1+(n-1)r,]? 


The standard error of r, is given by [10:46]: 
Utilizing this together with [11:10], we find 
that when the step-up is from the half score to 
the total score, then [11:11] becomes 

20115252) Standard error of Г. here 

cu ful Au derived vía half scáres. [11:12] 


(Here N is number of case 


c 
E 
1 7 
N-2 In the sample.) 


Utilizing relationships [11:05], [11:10], and 
[11: 13], 
у= Уу Үү + 2e5, = 2,(1tr,,) 11:13] 
Varlance of the sum of two similar measures 


we obtain 


Variance error of estimate аз 
= ыг а function of the variance = 
ES ya гү) and reliability coefficient [11:14] 


Illustrations from the field of mental meas- 
urement of functions involviné the reliability 
coefficient: А score on a psychological test is 
fallible and may be set equal to a true ability 
plus an error, as indicated in [11:01]. We 
clearly desire as much information as possible 
about the true ability, Х,» having the observed 
test score, Хү. Dealing with gross scores, 
rather than deviation measures, we write, 
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xX, « X, + e; Observed т ERE ТЕО] 
the e,’s being such that their mean for а large 
sample tends towards zero. This must be so for 
otherwise there is something in the e,’s which 
is not chance and this would then be a part of 
X,. Accordingly the mean of the X,’s is equal 
to the meanof the X,'s within limits represented 
by the variance error И, for given values of X , 
namely, У. /N. This variance error is only 1/N 
as large ás that attaching to the individual 
Score so we may, unless N is very small, inter- 
pret individual scores on the basis that 

The mean of true ability 
И, = И, БЯТГ mean of observed ability (11:15) 


Dealing with deviation Scores, we can obtain 
the variance of both members of [11:01], utilize 
[11:14] and obtain 


5 An estimate of the variance of 
И Л true ability [11:16] 


and 
An estimate of the standard 


27 = 93 (850 deviation of true ability [11:16a] 


This and further relationships given involving 
"true ability" are Strictly true in the popula- 
tion and approximately so in the sample. We 
note that the observed variability in a group 
tends to be greater than the true variability. 
The difference with instruments of low reliabil- 
ity may be so great аз materially to change the 
Picture of events, For a certain reading test, 
with reliability of about .36, ап investigator 
found that some 47 per cent of fifth-grade pupils 
fell below the fourth-grade mean or above the 
sixth-grade mean, thus indicating serious mis- 
grading or misclassification, Utilizing [11:16] 
and assuming a normal distribution, it is simple 
to compute the percentage which, in true ability, 
lies outside these same limits. The answer is 23, 


8 
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only half the percentage given by the raw scores. 
We can estimate an individual's Х,, from his 
1 


1 
быу id о [11:17] 


X 


Obtaining c, from [11:16a] we find that 


Correlation between a true and a 
Tig УГ} fallible measure of the same [11:18] 
function 
- v4 Oo "T 
ЕТ Uus [11:19] 
ex 


Regression of true 

Ku rX; +(1-гу) My не [11:20] 
This is an interesting equation in that it ex- 
presses the estimate of true ability as a weighted 
sum of two separate estimates,— one based upon 
the individual's observed score, X,, and the 
other based upon the mean of the group to which 
he belongs, И,. If the test is highly reliable, 
much weight is given to the test score and little 
to the group mean, and vice versa. Suppose 
fourth-grade pupil A and fifth-grade pupil B 
each score 45 on a test having a reliability of 
.80 ineach grade, and that the means and standard 
deviations for the grades are: 2,740; o,=10; 
Н;=50; and c;-10. For pupil А we estimate his 
true ability thus: 


Xo = .80 (45) + .20 (40) 


44 
For pupil B 
X, = .80 (45) + .20 (50) = 46 


This difference in outcome is certainly sound. 
We know two things about pupil A, the first 
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fact (X,745) suggests a true ability of 45, and 
the second fact (member of group whose mean = 40) 
suggests a true ability of 40. The best composite 
of his ability is 44, as given by [11:20]. 
Suppose for the single hour when tested pupil A 
had sat with the fifth grade, would we now use 
the fifth-grade mean and estimate his true ability 
as 46? Certainly not, for pupil A is still a 
fourth-grader. This group membership is not a 
whim, but a thingasdefinitely attached to pupil 
A as is his score 45. 

The variance of the estimated true scores for 
the entire group is clearly 


Variance of estimated 
у= ' тезу. : 
Ф 171 true scores (see (11:221) 


[11:21] 


which we note is less than Vj. The relationship 
У,>У ^V. is important. 

The variance error of an estimated true score 
is given by the usual formula [10:09], which for 
this situation becomes 


E 2 Variance error of estimate 
Уз. T Ү,(ту-гү) of a true score, Х, from х, (11:22) 
Тһїз is less than the variance error of estimate 
given by [11:14]. The difference in the situa- 
tions must be explicitly noted. When Хү is taken 
as evidence of х,, the variance error of esti- 
mate is given by [11:14], but when Xj is re- 
gressed so that гух, is taken as evidence of Xo. 
ап improved estimate is gotten and the variance 
error of estimate is now given bvEL)522].- If 
the mean and reliability for the group to which 
the tested person naturally belongs are known, it 
is always preferable to use the regressed score 
as the estimate of true ability. Since this best 
practice is infrequent practice, the pertinent 
variance error of estimate is usually that given 


by [11:14]. 
Formula [11:21] throws interesting light upon 
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the classification of individuals by fallible 
measures. Suppose upon a scholastic test we have 
fourth, fifth, and sixth-grade means of 40, 50, 
and 60, and that с, for the fifth grade is 10. 
Assume a normal distribution of ability and a 
rule which demotes fifth-graders who score below 
40 and promotes fifth-graders scoring above 60. 
If the reliabilityofthe test is 1.00, we obtain 
the correct result of 16 per cent below 40 and 
16 per cent above 60, and thus we reclassify 32 
per cent of the pupils. If the test has a reli- 
ability of .50, we find with the aid of [11:16a] 
that c, = 14.14. Employing raw scores, reference 
to a normal probability table informs us that we 
would now reclassify 48 per cent, which is an 
excessive number. If, however, we use regressed 
scores, х„\=г‚ух,), we have a distribution whose 
standard deviation is 7.07 (from [11:21]) and 
reclassify 16 per cent, which is a conservative 
number. It can also be shown, using volumes of a 
normal bivariate surface as tabled in Pearson 
(1931), thatofthe 48 per cent reclassified upon 
the basis of raw scores, 26/48 did not in truth 
fall beyond the limits set, and that of the 16 per 
cent reclassified upon the basis of regressed 
scores, 6/16 did not in truth fall beyond these 
limits. In short, the use of a fallible measure 
at its face value in connection with promotions, 
classification, etc., will lead toor create many 
misplacements, while the useof thissame fallible 
measure properly regressed will create few mis- 
placements. If we will but regress scores and 
compute standard errors ofestimated true scores, 
we need not hesitate to use an instrument of low 
reliability. 

The reliability of measures and issues in- 
volving two variables. Let x, be a true cri- 
terion, x, the fallible criterion of reliability 
ry, Г, a true predictor, and x, the fallible pre- 


412 FURTHER CORRELATION ISSUES [11:23] 
dictor of reliability rji. We have the covariance 


1 
Cor "p 20а + ey, + е,) 


1 
Ty (ary “олсе, *2x ej tXe,e,) 


These last three summations are chance deviations 
from zero. We can set each = 0, an unbiased es- 
timate of zero. We accordingly reach the follow- 
ing unbiased approximation: 


СОН Аб в « (11:23) 


Coy 99017564 Toi Correlation cor- 


= en е4 for 1 
fei attenuation in [11:24] 


950 Tg Угус, УГ, one variable 


By a similar derivation, 


c To. С. 5. Spearman's formula 
Жолу mcr for correction for attenu- [11:25] 


Arg. Pre ation in both variables 
Го Ti 


If reliabilities are computed by finding the 
correlation between comparable half scores and 
Stepping up, [11:10], the variance error of re) 
1s given by formula 25:50) Жапа the: variance 
error of г by [13:88]. 

For the estimate of a true score we have, as 
given by [11:19], 


Xo 7 ах, = гор Prec 11.26] 
с с 


Since the estimate of Хо from x, is 


Ор 
=, 
em 
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we observe that we reach the same answer in the 
two cases. However, this common answer is a more 
accurate estimate of the unknown true measure x, 
than of the fallible measure x,. 


95,, = % У l-rgi 
as given by [10:49], and 


SOS Тг = gy Y гд 101 зав а М 


Standard error of estimate of a true criterion X, , from X; 


БСН 


This is encouraging for it entitles us to be- 
lieve, even if we do not know гу, but do know 
that the criterion is fallible, that our actual 
estimates of the criterion are more trustworthy 


than indicated by o,/l-rj,. Аз illustration, we 


can cite the many studies reporting correlations 
with teachers' marks. In view of the fact that 
the reliability of teachers' marks seldom exceeds 
.75, that their average is in the neighborhood 
of .50 (lower for character ratings), and not in- 
frequently are no higher than .30, we are entitled 
to attach considerably more significance to cor- 
relations with teachers' marks than has usually 
been done. 

The significance of differences between falli- 
ble measures. The raw scores X,, and X,, of 
individual a upon two achievement measures are 
ordinarily in such units that the raw difference 
Х,,-Х,, is not interpretable. If individual a 
belongs to a normally competitive group, a useful 
procedure is to express scores as standard scores 
using the mean and standard deviation of this 
group. Thus, 


Zia = (X,,7N 7/0, 


апа 
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25, 7 (Х,,-И,)/0, 


Then z,,-z,, expresses the superiority of indi- 
vidual a in trait 1 to trait 2 relative to the 
group standard. А difference measure of this 
sort is doubly contaminated with error,—that in 
z,, and that in z,,. Counselors unfamiliar with 
this generally large error have foolishly trusted 
observed differences while other unduly conser- 
vative counselors have been prone to distrust all 
observed differences and place their trust in 
the general-mental-ability doctrine which con- 
siders the level of ability of a person to be the 
same in all intellectual functions. The precise 
handling of such within-the-individual differ- 
ences is not difficult, but as one would expect, 
the standard error of the difference z,,-z,, is 
not infrequently very large. If this same indi- 
vidual were tested with comparable forms of the 
two measures, his observed difference would be 
21.-2тт.. The correlation between such measures 
of difference for the group entire is the relia- 
bility of this difference in standard score 
measures, We may designate such a difference as 
2;,72,, with the symbol d;, ,,, and read it "the 
difference between z, and z, for fixed values x, 
and х,," or "the difference between z, and 7, 
which is independent of x, and x,." This is 
obviously so, for when the individual is the 
same, his true abilities x, and x, remain the 
same while the individual is being measured by 
X, or X,. 


du Acer cr ХОЛ. c [11:28] 


12 


Vids) 22:20:20: mc auo [11:282] 


с(4,,4, тр) =V,tV,-2c,, rtr 2r. 8. . . [11: 29] 
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Ig ape 
r(d,,d, ER pa [11:29а] 


Reliability of differences between the standard scores 
of an individual 


d z onec ug ugs НР] 


12.Фу Wlewy 22.0у 


ЗЕ Variance error of 
"а, ›. ay) 525 гі TES Individual standard [11:302] 


score difference 
Formula [11:29a] shows that only in case r}, 
is zero or negative does the reliability of the 
within-the-individual difference compare favora- 
bly with that of the reliability of the measures 
employed in obtaining the difference. 
У(4,,) can be analyzed into non-chance and 
chance independent parts, thus: 


а, ,) = [V(d; ,)-V(d; ;. 237) + V(dy;, 252 


VGA Vds uu oto sep reb ШЗ] 


in which 
2 2 
hee rA ОЕ [11:32] 
21 28 
(а д^ ИЕ] 2 (see [11:29]) [11: 32а] 
Accordingly [11:31] тау be written 
UU E NIU CDU Оо [11:31a] 
Total V= realV + chance V 


A precise variance ratio test involving У(4,,,) 
and Wy „у) would be valuable, but is lacking 
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because the restrictions that have been placed 
upon the variables have been nonlinear in charac- 
ter. 

Two other measures of difference are important. 


They are P and dy 
2 w zy 2 „ Zy 
а. Le aL 11:33 
o/os Y o, с; т Ут, ( 1 
w 7 


a non-individually observed difference, but one 
whose variance for the group can be computed. 


Its variance is 
А group measure 


Wd jay y/y) = 2-2г,, ef !41озулегазу [11:332] 


This measure of the extent to which the members 
of the group in their entirety are different 
(within themselves) in the two abilities would 
be of importance in a study of several groups 
which have had different types of tutelage. Fur- 
ther, if the variance error of Tay as given by 
[13:88] is such that there is negligible proba- 
bility that this variance is due to chance, it 
may then be appropriate to secure individual 
measures of difference, [11:28], or [11:34]. 


Regressed measure of 


z 
көше! = within-the-Indivi- 9 
d с o Ti?17T275 gual gifference [11:34] 
LET LET (seo (11:371) 


Writing dy thus: 
dy = P1218 222 = (412-022, )+( ге, -гье,) 


we obtain ап analysis of the variance into non- 
chance and chance portions. 


Dt Ио [11:34а J 


Иа) =V(d, yr oF WY оу? 


ла а па бб as 
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2412- =(г?д+г?- 
гү+г?-@гуггу, =®(гї+гї-2гүггу») 


+(гї-гї+гї-г?) (11: 34b] 
Of course the restrictions which have been im- 
posed are nonlinear, so a variance ratio test 
would not be precise, but ordinarily we can se- 
cure serviceable information by dealing with 
critical ratios. The individual d;./o(dz; wy) 
is a critical ratio. The mean square of such 
may be compared with the mean square of 4,,/с 
(di,,,,y) in order to ascertain which measure, 
45, or d,,, is the more efficient, 


Wd, _ 2-2. , 


ы seks онд Diis 38] 
Yd oy? 2-гу-г, 


r - А 
Wdzz;) fatFe-2riretis 


т a ye 
У(4:5 тї-г{+гїзг$ 


255 9 [11236] 


Substitution of numerical values in these equa- 
tions shows that in case reliabilities are 
greater than .50 the regressed, or 4,5, measure 
of difference is more trustworthy than the non- 
regressed, or d,,, measure. 

The proportion of cases whose idiosyncrasies 
(within individual differences) are revealed by 
fallible measures. We will consider this prob- 
lem when the observed measure of difference is 
dj, and second when day Let us assume that 
all distributions of differences are normal. 
The ratio c (4,,,559/о(4 2) is the ratio of the 
standard deviation of differences due to chance 
to the standard deviation of the observed dif- 
ferences. As this ratio becomes small an in- 
creasing number of the observed differences are 
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not artifacts. ! 1 
Chart XI I illustrates the situation. The 
dash-curve is the distribution of differences 


CHART XI I 


attributable to chance and the full-line curve is 
that of the observed differences, so that the 
shaded portion is the proportion of differences 
in excess of the chance proportion. Table XI A, 
herewith, (see Kelley, 1923), gives this pro- 


TABLE XI A 
NON-CHANCE DIFFERENCES FOR DIFFERENT RATIOS 


PARU ERR ЕАН ДАЛД RU US өмр 
ст DUE TO СНАНСЕ| PROPORTION OF | o DUE ТО CHANCE| PROPORTION OF 
А DIFFERENCES IN | — I I rreERENCES !N 

ООЗЕ ЦОЛ Ер А ОБ ЕС ТИРАН ШКО МОН ЗЕВУ 507 I| EXCESS OF THE 
CHANCE PROPOR- CHANCE PROPOR- 
TION 


portion for different values of the ratio. Chart 
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XI I approximately represents the situation found 
in connection with Stanford Achievement Test 
measures of Arithmetic Computation and of Arith- 
metic Reasoning, for 96 eighth-grade pupils. 

Arithmetic Computation score = x, and has 

a reliability of .669. 

Arithmetic Reasoning score = x, and has 

a reliability of .825. 


ody» ay) 22005840 = „ТИ 
o(d,,) = иту; -1.246 


Ratio 7 .571, equivalent to 26 per cent of 
cases having di fferences in excess of chance. 


кари e 

“(Оу у) = Уг:-г+гз-г] = 517 
pa E 

(4...) = УГО = .939 


Ratio = .551, equivalent to 27 per cent of 
cases having di fferences in excess of chance. 


For these same data the reliability of di;, 
as given by [11:29a], is .674, and the relia- 
bility of бу, as given by [11:37] is .697. 


Ziri- 
EE A Sa (Seo [1:34]? [11:37] 


2. 22. 
ri*r272r,r.fio 


Reliability and other issues involving three or more 
unequally reliable measures ot tne same thing 


A typical situation would exist when several 
judges separately rate, or score, the same set 
of objects, bethey people, cattle, art products, 
or what not. The rating by each judge is upon 
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some commonly defined scale like "impulsiveness 
of individuals,""general excellence of beef 
stock," "beauty of art sample," etc. Since the 
impulsiveness of a person can scarcely be better 
defined than as equal to the average rating, or 
score, given by an adequate number of those com- 
petent to rate, it is appropriate and useful to 
assert that the judges are rating the same thing, 
though they may be unequally expert in doing it. 
Similarly for many other situations. We may 
thus write for the scores as given by the k 
judges 


ху=сух te XCX +е„;... x, =C, x te, 


wherein the c’s change from judge to judge and 
depend upon the respective scales used, wherein 
х, is the true score attaching to the object 
scores; and wherein the e’s are errors of judg- 
ment, uncorrelated with each other or with the 
true scores. 

When we have three judges it follows, as shown 
by Shen* (1925), that the reliability of a single 
judge, say judge number'l is 
= 18125102 Triad formula for reliability [11:38] 
LT 


The variance error of г; thus determined is 


2 2 1 
ү =A гун 1123 (Cor) 


бу 
N-2 DEOS 


1 1 

(—+—) - 5] [11:39] 
2 2 

22 5 


* Formula [11:39] herewith differs slightly from Shen's formula 
In which there is a typographical error. 
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If the number of measures of the same thing 
is some number k, greater than three, we first 
express each set of measures as standard scores, 


thus z} = (X,-M,)/e,; 2. = (Х,-Й,)/сү,5 •.. 2,5 
(X,-M,)/c,. We let s=z, +2) +... 2, and let 
S; = s- z,. Since the correlation between 2; 
and s, corrected for attenuation is equal to 
1.00, we have the reliability of the first set 
of measures as derived by Shen [11.40] 


гү Е ЧЕ [11:40] 


RellabliTty from the intercorrelations between unequally 
excellent measures of the same thing 


In this formula r,, is the average of the 
k(k-1)/2 intercorrelations between the k measures, 
and r,, is the average of the (k-1) correlations 
between z, and the (k-1) remaining measures. 

Frequently гү, can be gotten by [10:105] and 
гүү by [10:103] ог by these formulas modified to 
be appropriate to standard score variables. 

If we have four measures of the same thing, 
but unequally reliable, it is simple to prove 
that the three following tetrads, two of which 
are independent, equal zero. 


0 


tio3u 7 Гози: 7 713724 SUELE 
trads when a 

= single ger 

=0 E factor l : 41] 
accounts for 
all inter- 

= Q correlations 


уча 7 figfow 7 Гї4723 
#1243 ^ Тї2Тэч 7 lFi«fa3 


The variance error of Ё; зц aS derived by Kelley 
(1928) is 
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УСЕ зза) = [rio +гүу+гг„+гз,+2г 


1-2 12714723734 


Hr оон Гозо ГуРу Гоц 
3 2 2 1.2 
2гізГуцГуц 27 Гоц уч t f123 (715 Газ 


desto Hr КӨЙ [11:42] 


Thus, in the case of four variables, we may 
say that they may be conceived as due to a gener- 
al factor plus four specific factors when 


fijfjy 7 fygfoy ~ Tiyla3 

Kelley has shown that the pentad function 
following equals zero when five variables may be 
thought of as consequent to two general factors 


plus specific factors: 


Ё оза гу 13 ou 35155 T 149 15 23724735 7121 yl 251 34755 
Tr3371 Г 34 174271512514 Fu 5 T1 4T14 ГРОБ tan 
Ty Ml 25735145 1131 151 231 24145 T12115T 24 3ч Гу; 


“гүзүү ouf 257 3571 12713. 25734745 “14 15. 23125758 
[11:43] 


The computation of the variance error of the 
function is rather laborious. The relationships 
immediately preceding and still other general 
relationships connected with "factor analysis" 
and the minimal necessary number of independent 
variables underlying a given set of correlations 
can be well expressed in the notation of matrix 


algebra. 


-———— m VEM WO 
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The weighting factor which is to be attached 
to a variable because of its unreliability. Let 
the standard scores Zi; Z5» 234» + + + » each be 
measures of z (and of no other true measure) and 
have chance errors €}, ёр» €4» + + + » whose 
variances are unequal. To alow for different 
units of measurement we write: 


2; = а: фе, апаи sie алу, ED) 
zo = аге, айй Кесе асу tls) 
2, = az, *e5, andy Е a$V,, + (175) etc. 


(V, as here defined being different from V, as 
defined by [11:16] and [11:01].) 


From the variances we note that 


а, а, а, 

mL TI ес 
pz 
Уг, Yr, уг, 


so that we may now redefine 2, and write the z's 
thus: f 


Zi CIVI ЖИЛЕТ 
Е. 


Taking the varianceofz, we have, 1=r,V,+(1-r,), 
во Ace Oe 


Zi е, 
= T + 
уг SUP 
1 
2, е, 
0. н 


etc. 
The left-hand members are unbiased estimates of 
м: БУ [13:19] the best combination of them 
will be that given by weighting them inversely 
as their variance errors. The variance error of 
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z/Vr, (equal to that of ey/Vr,), as given by 
[D TAN ОТЕ pa phe inverse of this is 
the proper weighting factor of z,/vr,, so that 
the best estimate of 2, is: 


Yr уг, Уг, 

123 Zi R z, + l-r, 2, Е es 
T - = [11:44] 
ix бүр ү 3 

MED ЕР Wen то 


Otherwise expressed, the ratios of weighting fac- 
tors to allow for differences in reliabilities 
as otherwise derived by Kelley (1927, pp. 211-213) 
are 


Vi, vt, YN 


ЕТО ity IE, 


 ЕВЕЄЛ я а м гк. (11:48 


Of course these weights do not overtly modify 
or supplant the weights in a multiple regression 
equation, but lacking regression equation weights 
these weighting factorsmayconfidently be expec- 
ted to be beneficial. Table XI B herewith pro- 
vides these weights for different reliability 
coefficients. 


TABLE XI B 
WEIGHTING FACTORS CONSEQUENT TO VALUES 
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TABLE XI B 


(CONTINUED) 


31 

«07 «284 .32 .832 „57 1.76 „82 5.03 
.08 .307 .33 ‚857 „58 1.81 . 83 5.36 
.09 „330 .34 „883 „59 1.87 .84 5.73 
«10 «351 .35 «910 .60 1.94 .85 6.15- 


ell +373 +36 +938 61 2.00 «86 6.62 
12 «394 «37 «966 .62 2.07 .87 7.17 
„13 EIL .38 .994 «63 2.15- | .88 7.82 
. Ч „435+ | .39 1.024 „64 2.22 .89 8.58 
«15 «456 .40 1.050 „65 2.30 +90 9.49 


«16 476 E 1.085+ |.66 2.39 .91 10.60 
«17 „497 .42 1.117 .67 2.48 .92 11.99 
.18 „517 „43 1.150+ |.68 2.58 .93 13.78 
«19 „538 .44 1.185- |.69 2.68 „94 16.16 
.20 +559 «45 1.220 .70 2.79 .95 19.49 


.21 «580 «46 1.256 71 2.91 .96 24.49 
22 «601 „47 1.294 2 3.03 .97 32.83 
‚23 „623 .48 1.332 .73 3.16 .98 49.50 
-24 .645- | .49 1.373 .74 3.31 .99 99.50 
„25 „667 .50 1.414 „75 3.46 


SECTION 2. THE EFFECT OF VARIABILITY 
IN RANGE UPON CORRELATION 


The effect of differences in ranée of talent 
examined upon reliability coefficients. Consider 
a test with scores, say, from 0 to 50, yielding 
a mean of about 25 when given to the group to 
which it is best adapted. If given to a markedly 
inferior group the scores will tend to cluster 
around 0, and if to a markedly superior group 
they will tend to cluster around 50, andif given 
to a group so heterogeneous as to include in- 
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ferior, average, and superior, the reliability 

-scatter diagram involving scores X, and X, upon 
two similar forms of the test will tend to have 
lanceolate contour lines. This is evidence 
of malfunctioning of the instrument at both low 
and high levels. If the malfunctioning is at 
one of these devels only, pear-shaped contour 
lines will be found. In connection with reli- 
ability the first observation to make is of 
the reliability scatter diagram in order to as- 
certain the range of effectiveness of the instru- 
ment. The lanceolate contour lines indicate 
poor functioning of the instrument, but this is 
not revealed by the reliability coefficient, 
which in this case is high and accordingly very 
misleading. 

If the instrument functions equally well 
throughout the range of talent tested, then the 
variability of differences (X,-X,) will tend to 
be the same at all levels. The variability of 
(X,-X,) is a variability at right angles to the 
major axis ofthe contour ellipses in the (X,,X;) 
scatter diagram and can be estimated by inspec- 
tion of the scatter diagram, or it can be compu- 
ted for different levels, а subject's level 
being determined by (X,*X,)/2. For an instru- 
ment of uniform excellence we have 


V(x,7x,) *Vi*V4720,7,1,, = 2У,(1-г,) 


thus 
у,(1-г;) = a constant [11:46] 


is а situation which maintains throughout the 
range of equal effectiveness of the instrument. 
This immediately provides us with a means of es- 
timating the reliability for one range of talent 
knowing it for a different range. We will use 
lower-case letters to indicate a narrower range 
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and capital letters a wider range. We have: 
Relationship between 


V. -r =}, - reliability and varia- 2 

ne 12 101 R,) bility In the case of an lil: 46a] 
instrument of uniform merit 

Introducing the variability of true scores, this 


relationship becomes: 


a) a [11:47] 
om css гэд 


We may note that v,(l-r,) is the variance of 
(x,-x,,) so that equality, for different ranges, 
of the variance of the differences (x,-x,) is 
equivalent to the equality, for different ranges, 
of the variance of the difference between ob- 
served scores апі true scores (see Kelley, 1921). 

Table XI C, herewith, illustrates how closely 
relationship [11:46a] is maintained in the case 
of one instrument.designed to be effective from 
the second to the ninth grades.* 


TABLE XI C 
DATA UPON STANFORD SPELLING TEST, FORMS A AND B 


Number of cases approximately 165 per school grade 


SCHOOL ACTUAL RELIABILITY 
MEANS VARIANCES RELIABINITY 
COEFFICIENTS 


GRADES FROM TOTAL 
339. 


USING (11:46a) 


2 to 9 


inclusive 


* Kelley, Ruch, and Terman, STANFORD ACHIEVEMENT TEST MANUAL 
OF DIRECTIONS, 1925 and revised, 1927, Tables 2 and 4. 


улаач Ч 


З љо Ва 
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The differences between the actual and the 
estimated grade reliability coefficients are 
somewhat greater than may reasonably be attrib- 
uted to chance, However, lacking specific narrow 
range determinations, formula (11: 46а) should be 
used to estimate the narrow-range reliability 
knowing the wide-range value, 

For any problem the reliability coefficient 
to be used ia that which is appropriate to the 
normal competitive group implied in the problem. 
In school matters the normal competitive group 18 
the school grade and certainly not the group 
given by combining grades 2 to 9, Even to re- 
port a reliability coefficient for a noncompeti- 
tive wide-range group is misinformative to all 
except the statistically expert. 

an individual may be considered as either 
8 member of a narrow or of a wide-range group we 
ask the question of whether the connection in 
which we study him makes any difference, We 
m two forsula (11:20) estimates of his abil- 
ty: 


x * fr * (17r, =, using narrow-range 
statistics; 
XLT R,X, + (1-R, )И,, using wide-range 
statistics, 
Foreula (11:22) provides us with the variance 
errors of these estimates. Utilizing [11:47] же 
find that the ratio of these variance errors 18 
vui vno f 


жа <" 


showing that тоге reliable results are always 
obtained by using the narrow-range group. In 
short, if the individual may be treated as a 
member of а number of groups, the most reliable 
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results will be obtained by treating him as a 
smenber of the sost homogeneous of these, which, 
of course, 18 the group for which the reliability 
coefficient is the smallest, 

The effect of curtailment in range of a firat 
variable upon the reliability coefficient of a 
second variable, If a reliability coefficient 
is computed for a range of talent which has been 
restricted in a known manner upon the basis of 
another variable, it obviously is, in general, an 
inaccurate measure of the reliability when no 
such restriction 18 present, Рог example, the 
reliability of а test, Х,, computed for those 
who have attained a certain passing werk in X, 
is, in case X, and X, are correlated, smaller 
than the reliability ofX for a group consisting 
of those who have failed as well as pasaed, X,. 
As deduced from а formula given by Davis (1944), 
the reliability, Wp, in the wide range, knowing 
that in the narrow range, fp, knowing the corres 
lation, гүү, in the narrow range, and knowing 
V,/v,, the ratio of the unrestricted variance in 
Хү to the restricted variance, 18 


г, * "nc =U 
в, " - "CERTA NM UL 
У 
1+7, - 0) 
v 
The effect of curtailing the range of one of 
the variables entering into а correlation problen, 
Let the variables have the correlation &,, and 
let it further be true that the arrays ere нь. 
scedastic so that V, , i» constant for successive 
X, values, If there is өз imposed alteration 18 
the X, values so that one or more of the arrays 
of X, values are renoved entirely, or if a res- 
domly chosen sumber of cases is withdrawn f ron 
any one of these arrays, it will sot change 


Em — Ын 
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either V, , or B, ,. Thus, letting lower-case 
letters stand for the situation after the imposed 
curtailment in the X, measures and a consequential 
curtailment in the X, measures, we have: 


У. И, 2, (1-22) = И, (1-2,) . . [11:49] 


v V 
= 2 Deep? 1 3 
DH в. [11:50] 
Уу у, 
so that 
2 2 Relationship when a 
Тїс Ri. 


Их me co PP M ууу, relationship [11:51] 
Vo lunio V,(1-R2,) 1$ imposed 


Of course, when selection is on the basis of the 
second variable, V, , f V, , and b,, # B,,. 
Though there has been alteration in the variances 
of both variables, only the v,/V, ratio is ma- 
terial in estimating the change in the corre- 
lation coefficient. Solving [11:51] for rij, We 
have, as derived by Pearson (1902 Inf.), 


12 


Mes 


= M . . О 
1-2, m Riy Eo 
2 


fi » » [11:52] 


Table XI D gives the change in correlation 
with different curtailments (see Kelley, 1923). 


TABLE XI D 
RELATIONSHIP BETWEEN CORRELATION AND AN IMPOSED 
CURTAILMENT OF ONE OF THE VARIABLES 
92 Ar Е ОИ S EB г=,95 
2, THEN R= R= RS R= = Nu = 
275 164 1:263: 9938 20:508 GENET 07208800372 __.э97! 
.50 .197 378  .532 «658 .832  .936 .987 
«25 «373 +632  .783 «868 «949 -983 „997 
.10 ‚709 „898 .953  .975 „99! „997 „9995 
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_ An interesting possible use of [11:51] lies 
in the study of biological phenomena. If, for 
some form of life, in one geographical region 
two characters have the constants V}, Уу» Гү, 
and in a second distant region the constants V,, 
V,, R,,, and if it is believed that the life in 
one region is a migrant and variable form of 
that in the other, then the two characters X, 
and X, may be investigated to see which is the 
more causal in causing the selection. Tina) 
[v,(1-r?,)] is more nearly equal to Р? 2/ (V3 (1-8 221] 
than г2,/ [;(1-г2,)] is to R2,/[V, (1-82 ,)) then 
the indication is that X, is more closely con- 
nected with the cause of selection than is X}. 

If the curtailment is an arbitrary cutting 
off in the case of one variable of a portion of 
a distribution which is believed to be approxi- 
mately normal when uncurtailed, the ratio v,/V, 
can be readily determined, knowing the portion 
which has been deleted. For example, let us say 
that Test X, is employed as an admission-to- 
training-in-the-Air-Corps instrument; that the 
form of distribution on this test of all appli- 
cants is approximately normal; that arrays are 
essentially homoscedastic; and that the lowest 
25 per cent of those taking the test are not ad- 
mitted. For those admitted, the correlation of 
X, with attained flying efficiency 13 Г} о. What 
reasonably would have been the correlation had 
all the applicants been admitted to training? 
We find the ratio of v,/V, by [8:29], introduce 
the result into [11:52] and find the R,, which 
is equivalent to Гүр» Had r}, been .50 the an- 
awer would be R,,.7 .62.* 

The effect of double selection upon correla- 
tion. This problem was attacked by Pearson 
* This procedure extensively used by Carl Brown, in Robert M. 


Yerkes, ЁО», PSYCHOLOGICAL EXAMINING In the U.S.Army, Nat. 
Acad. of Scl., Vol. 15, 1921. 
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(1908) in the case of normal bivariate correla- 
tion. The explicit solution is provided in For- 
mula [11:53], derived by Vern James.* 


a -(1-Е1,35ү5, «У(1-К1,)“У,Уү vv; RT, 
a с 7 
20, o, Ку, 


[11:531 


The specific situation to which this formula ap- 
plies requires that (a) the distribution prior to 
selection, represented by capital letters, be a 
normal bivariate distribution and (b) that the 
distribution after selection, represented by 
lower-case letters, be a normal bivariate dis- 
tribution. 
Formula [11:53] may be written 


ro R? 
к= эш ыл ш y) [11:54] 
ууу, (1-г2,)) ШУЫ CI-R22) 
ог ара1п 

бү? Ry, 


= = К 55] 


91.2 ©з 21.2 Du 


revealing an invariant relationship which may 
well be important in the study of natural selec- 
tion. 


* A class paper submitted to tne writer by Vern James, Febru- 
ary, 1925. James made the following happy substitutions: 
He let Pearson's a-t) = X; Pearson's ü-£izy; Pearson's 


2/03 = a; Pearson's 2/0, = b. 
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SECTION. 3. THREE-VARIABLE MULTIPLE CORRELATION 


у The three-variable problem will be approached 
in several ways because it involves practically 
all of the issues inherent in the general, or k- 
variable, problem and is algebraically much sim- 

рын than the general multiple-correlation prob- 
em. 

Let Ху, the criterion or predictand, be a 
measure which it is desired to estimate from a 
knowledge of X, and X,, the independent variables 
or predictors. There is so great a simplifica- 
tion when the regressions of X, are linear that 
it is desirable to transform X, and X, into new 
variables yielding linear relationships with X,, 
if a preliminary study of the Ху, X, scatter 
diagram, and/or of the X,, X, scatter diagram, 
reveals demonstrable nonlinearity. Though non- 
linear regression of X, or X, upon X, is not a 
major detriment, the writer has frequently found 
that a single transformation of the X, variable 
will frequently rectify the regression of X, 
upon a number of other variables, X,, X5, Х,, 
etc.,—a result otherwise accomplished only by 
making transformations in all of the predictor 
variables. 

In the following treatment we assume that the 
regression of X, upon X, and upon X, is approxi- 
mately linear, and that an equation of the type 

3-variable multiple 


Х. =а+ biX; +b,X, regression equation [11:56] 


Kon dE TIGE ОО ос [11:56а] 
will constitute а sound basis for estimating X,. 
A more explicit notation for the regression 
coefficients is Ьо, ;:— read "the regression of 
X, upon X, for X, constant",—and Б, ,. Where 
there is no ambiguity we shall use the shorter 
notation for both b and £ (see [11:57]) regres- 
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Sion coefficients. 

Let standard z,, z,, and z, scores be as de- 
fined by [8:23]. Then [11.56] is represented by 
an equation of the type 


— _ 3-variable regression 
ЭНЭ ЭЭ Ce n [11:57] 


We have 


с 
by = Acc and b, - By 11:581 
1 2 


а = ИБИ еа [11:59] 


When the standard score 2, is estimated by means 

of [11:57] the variance error of estimate, К хо» 
is as follows: 

Ци t Х(20-В,21-8,2,)° 

N oe 11:60] 


= 1+2 +8; 28,09, 28,795 +28: B5r 12 


It is desired so to determine the f's that kj ;, 
shall be a minimum. We may, without loss of 
generality, write Д; = буу and introduce y into 
equation [11:60]. We obtain, 


Ke. 127 18297 = 28;у\гу-В›гу 27 оз 78,7. 2071 
+ 1*82-28,0,5- (са ЫДЫ” 


To make DD a minimum y must be given such a 
value as to make the [] term, which is a perfect 
square, equal to zero. Then 


5 145; 72 Bra; =г21+28,ғугу, бга US corio) 


г; 2 y 2 
) (гоого). (FoTo) 3 
728, Мото 12) ЕЕ ш коншы = +1191 
oer 15112 
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This is a minimum when the [] term, which is a 
perfect square, is equal to zero, thus 


_Fo2Foifi2 B, or standard score, multiple 


B. = regression coefficient " 
2 1-г? (3 variables! [11:61] 
12 
_ 0560552 

6. ене ТА [АЗ 2б а 
l-rj5 
IRR мые: 

к? troy roz Г 22 Ро Fo 5112 

Е Гр” [11:62] 

= 


Variance error of estimate of standard 
score criterion (3 variables] 


The correlation between 2, and 2, is the multiple 
correlation coefficient and is always positive. 
Since there may be various estimatesofz,, a more 
explicit notation than 25 isneeded. We shall use 
ZoAi2: Which may be read "that part of zy which 
is dependent upon z, and z,." The subscript ^ 
may be thought of as standing for the word "de- 
pendent, " just as in 20, 12 ,—"that part of 2, which 
is independent of 2, and z,,"—the subscript dot 
may be thoughtof as the dot to the letter iinthe 
word "independent." The correlation between 2, 
and the best linear combinationofz, and 2, is a 
multiple correlation. We shall designate it Tonio: 
Other notations are Го(ї2)9 Ко(12)» and Го, 12, 


but this last, which the writer has employed 
earlier, will be avoided as incongruous with the 
"independent of" concept which, as here used, at- 
taches to the subscript dot. For the correlation 
between 2, and that part of zy whichis independ- 
ent of z, and z,, we shall, asin the preceding 
derivation, use the notation K,,;;. Though r,.;; 


would beanentirely logical notation for this, it 
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will not be used in this text because of its 
earlier use in the opposite sense. Utilizing the 
principle inherent in [10:16] and [10:17], 


Ү(2,) 5 У(20л:22 + KETT 12) 


1 =A ke ООК о Рода +0012 [11:63] 


Some computational procedures yield Кз, їп 
which case we compute г?л; from the equation 


ель. [11:64] 


Other computational procedures yield the В re- 

gression coefficients, in which case it is simple 

to compute гус, which is identical with голуу, 
thus, when we note that о(20л122 = Болт 

2,08121+8222) PrtortPotor 

ADOL a Е. 


ToA12 7 Го(оЛ12) 
N с(2одхо? Tonio 


года? = Вусо 782502 ШЫ ШО e s [11:65] 


Computational formula for TONO 
knowing В regression coefficients 
The variance error of z,, as estimated by 
means of [11:57]; is К? ız» and of X,, as esti- 
mated by means of [11:56], is 


НЕ 22005120: rA e 020 [11:66] 


Variance error of estimate,—3 variables 


Vo. 12 


Corresponding to the analysis of variance 
equation [11:63] is the following which involves 


raw X, scores: 
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Vo = У даа 7 Vo,.12 


= Analysi f th 
V, =V rnz tVo(1-r3qr 2) variance of Xy [11:67] 


scores 
N-172 + (№3) Degrees of freedom equation [11: 68] 


That the residual variance Vj,,, has (1-3) de- 
grees of freedom is obvious when we note that the 
following three linear restrictions, and none 
other, have been imposed upon the X, scores: 


NM, 22X,; Кс, = >(Хо-Ио)х, ; Neg, = (Xo -M,)x; 


The nonlinear restriction AV, -3(X,-M,)! is not 
involved іп a, Бү, or b,. b, may be written 


Соло 
б, Ga ED Io 
s, iN E 
с 2 
т Alera) os 


a function not involving бо» or Vo. Similarly 
for b,. We can get the number of degrees of 
freedom of Улу; by subtracting (1-3) from (N-1), 
or by noting that the regression equation [11:56 
has two degrees of freedom due to the two para- 
meters b, and b,, for Mo, the additional para- 
meter involved in a, has already established the 
degrees of freedom of V, as (N-1), not М. 
The variance ratio, 


Von 2/2 Ё год 2/2 


ree ee 
гун-3 y, „/(М-3) (l-roa12)/(N-3) : : 


F 


provides a test for the significance of (галі2702- 
In the special case in which r}, -0,—a situ- 
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ation which can frequently be brought about by 
an appropriate design of an experiment, —we note 
that all of the correlations between X;, Ху» and 
х,у are equal to zero, so that we have: 


Хо 7 Ходуо +Х0.12 = b,x, +b, x, +Xo.12 
Analysis of Vo wh 


en 
v, = БУ, + b;V, ооа end ez асе inde- [11:70] 


pendent,-3 variables 


№-1 =1+1+ (1-3) Degrees of freedom [11: 71] 


equation 


The variance ratios, 


bi YD 
Е TET TER d 
1,0-3 У... 12/(№-3) 5281) 
Т bi Vo 
Fi, .-3 = — (0...0... [11:72] 


Vo. 12/(N-3) 


Variance ratios when predictors are uncorre- 
lated provide tests for the significance of (b; -0) 
and of (b,-0). If we desire a test forthe devia- 
tion of b, from b, a value not zero, we have, 


~ 
at 2 
23 (by bi) y,/1 (predictors indepen- 


1,0975 V, 12/ (8-3) dent) (cf (11:18) (11:13) 


Е 


The standard score regression coefficients, By 
and 8, may be interpreted in connection wit 
partial correlation, which is the correlation be- 
tween parts of variables. If 25.2 is that part 
of z, that is independent of z, it is given by 
20.2 = 20-0225: Its variance is Е: ЕЁ 
Similarly 2,2 = 7i7fi;7;: and the variance of 
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: 2 “= 2 3 
zi gis kiz аре The correlation between 


Z. and Z}, is written rọ}, and is read "the 
correlation between variables 0 and1 independent 
of the effect of variable 2," ог "the correlation 
between variables 0 and 1 for variable 2 con- 
stant," 


_ 220.221 ,2 Spam! 


To; we 5, 
МКо.2Ка.2 УТ-г2, Уг, 


Partial correlation coefficient,- 3 variables 


[11:74] 


Utilizing [11:61a] we obtain 


^ 3 koz 
а утара poo nee tion [11:75] 
12 
Relationship between regression and partial corre- 
lation coefficlents,-standard scores 


99.2 


ЗОО v ТОЛ 


by *b,,,2 7 £91.27 
1.2 

Relation between regression and partial corre- 
lation coefficients 


Yule (1907) and Fisher (1923-24) have shown the 
similarity in the properties of and the distri- 
bution of the partial correlation coefficient to 
those of a total correlation coefficient. The 
number of degrees of freedom in the partial 
correlation coefficientisless than in the total 
correlation coefficient by the number of second- 
ary subscripts (those after the point) in the 
partial correlation coefficient. 
If хүүр were used to estimate ху, з, the re- 
gression equation would be 
99.2 


em ORE 
X9, 2 7 01.2 X12 
9122 


Б ж 91 1.2 (11:77) 


Thus we note that the proper weight to attach to 
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хү, in [11:562], in estimating x, is identically 
that which would attach to that part of x, which 
is independent of x, when estimating that part 
of хо which is independent of x;. Similarly for 
the other independent variable. 

Reference to [11:77] informs us that Б, „ is 
a regression coefficient in a two-variable prob- 
lem in which the variables are x,,, and x, 5, 
each having N-2 degrees of freedom, so that a 
precise variance ratio test is given by [10:39], 
which becomes 


(ОООО ОУ Шз) 
ЕН mos 


Е 


V ucro) 


Since V, .› -V,Ki, and Vo. o(17101,2) 7 Vo.ia LEO 


" Бо. “Bos. Viki? (N-3) 


1,N-3 VERS [11: 18] 


F to test (is, Bar a) See [12:48] 


The parent population must here be conceived as 
one in which the identical x, and x, values as 
paired in this sample are repeated upon all suc- 
cessive samplings. Under these conditions of 
sampling there will result a succession of values 
о bor р LUIS the variability of these that 
is tested by [11:78]. Since this variance ratio 
has one degree of freedom in the numerator vari- 
ance, we may write 


2 
УК. 12 Variance error of re- 
gression coefficient,-[11: 79] 


22 х th riabl 
V, kio (N 3) ree va ables 


V(b,, 22 E 
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Determinantal expression of multiple correla- 
tion relationships: We herewith give these re- 
lationships for the three-variable problem and 
state, without proof, that they also hold for 
the general case of k variables, if expressed in 
connection with a major determinant, of the type 
^ herewith, but expanded to include the added 


variables. For the three-variable case we have: 
1 For foo 
Major correlation deter- 
Abs jt Tyo] mirant, 3 variables [11:80] 
по eg ti 
г г 
101 12 Minor obtained by deleting the 
do qi Ч 0 row and 1 column 
02 


Other minors are similarly designated and A,with 
subscripts, stands for a co-factor, which is a 
minor with a sign factor attached. 


A 
годе 20,12 А [11:81] 
0 


Multiple alienation coefficient, or variance error 
of estimated standard scores 


A -А 

Bor 27 Ay = RR эн dom rese patrie 
00 00 
SA -А 

ШҮ BE " un vs) 111828) 


ASA А-А 
“И ыйл = 01710 =| 01 10 11:83 
Fo1.2 Bo 1. 210. 2 A T 3,544: [ ] 
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8512 а бу Rave the same sign, and the sign 
that attaches to the radical is this sign. These 
two В’ з are conjugate regression coefficients. 


SECTION 4. NONLINEAR REGRESSION 


Notation illustrated in connection with a 
three-variable linear regression situation. The 
theoretical issues and the illustrative problem 
will refer to a quadric or second-degree para- 
bola, but the method can be immediately extended 
to higher-degree parabolic regression. It is, in 
fact, so slight a modification of multiple-re- 
gression procedure as to call for few new con- 
cepts. A new notation will be found convenient. 
By illustrating this for a three-variable linear 
multiple-regression problem, the similarity be- 
tween that problem and parabolic regression will 
be made apparent. In connection with the three- 
variable linear-regression problem there are 
slightly different procedures dependent upon 
whether the task set is to compute the constants 
of [11:56], [11.56a], or [11.57]. The major de- 
terminant pertinent to the determination of the 
constants in [11:57] will be called A, or A, that 
pertinent to [11:56a] 8, that pertinent to [11:56] 
d, or D, depending upon whether means or summations 
are employed. We employ р to designate a product 
moment involving X, ordeviation, scores, P a pro- 
duct moment involving X, orgross scores, and 5а 
summation of gross scores. 2, P, and р will each, 
in this three-variable problem, have three sub- 
scripts, the first indicating the power to which 
the first variable (X, orx,) is raised, the second 
the power to which the second variable (X, or х.) 
is raised, and the third the power to which the 
third variable (X, or Xp) is raised. Thus 

0х1х2 


EEN 
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Ххх; 
N 


PES 


Xijk = 5 


Such substitutions as X994 = Л: Руос= Moi 


Piro сук PION Vi; etc., can be made whenever 


desired. 
l гу Foe 


А=А = г Sl wes сон osx om Buses LEUR 


01 
foo Бай 71 
P200 Р110 Pioi Vo ©,  €o2 
$ 7|P11o Pozo Poii| "foi Va 2121 [11:84] 
Pioi Роза Poo? o? C12 Vo 
P300 Pioo Ру Pio P506 My PiyoPioi 
ae Pioo Pooo Poio Pool | и 1 Ж, M, 11:85) 
Prio Poio Pozo Pozi Piro LA PosoPoii i 
Paya Poor Poxi Poo2| |Рзоз Me PorsPooz 
DT) 5100 2110 Zio 
= = = 5, 
100 010 2001 
р= н in which Z,4, 5 [11:86] 
Ху уу Zoro Хого 2012 
Zion Zoo1 253i 2502 


The complete solution, starting with A, is as 
indicated in Section 3. The solution starting 
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starting with 8 or d will be obvious from that 
based upon D which is given herewith. 

When using subscripts we will call the first 
row of D the 0-row; the second row the a-row; 
the third row the l-row; and the fourth row the 
2-row; and similarly for the columns. Thus D,, 
is the cofactor of the element 2;,,. Note that 
the determination of the sign factor, which is 
simple in any case, must be as though the num- 
bering were 0, 1, 2, 3 and not as here used, 0, 
бу УУЛС 

The various statistics, most of which are un- 
necessary if the regression equation only is 
desired, are 


5100 Zoro 5001 
Но = M? и: M, а оо ШВ 
200 0 02 
у, = ИИ = Ш; У, = т -M; [11:88] 
110 5101 
XN -HM 201 - Mol, 
Foi сс 02 Без ; 
171111:891 
Zoii 
TIDA И, M, 
Ep 
EA 
-D м -р 
aci p o s е ШО], [11:91 [11:93] 
Doo Doo Dy, 
-ү, к? =V, а- 2 үе D variance errom11-93] 
И „12 7 VoKo.12 ТОМ fou 2 — of estimate : 


ND,, 


тээ s M 
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2-5 р 5 4 Itip! 
riz E Sa elata i [11:94] 


„сыйлы с Ер. 
701,2 УБ, 2610.2 7 РР БЕ po + 01:95] 
oDii 


Partial correlation coefficient 
(sign of Го, 2 is that of Do) 2! 


2 
Di» Di, 
r = їг ‚з р um ee 
02.1 9712.0 
Dy Poo Dio. 


Though the proof is omitted, it can be stated 
that the preceding formulas of this Section hold 
for problems of any number of variables by simply 
adding the necessary secondary subscripts to 
Vo.12: FoAr2» foi.2: ete. 

Parabolic regression, If the third variable 
is simply the second squared, the regression 
equation is 

Second-degree para- 


Хоа, 12 7A + BX, *CXÍ bolte regression [11:96] 


The linear estimate, Хо, may be written 


Холі = a+ bX, Linear regression [11:97] 


In the case of [11:96] the correlation obtained 
may be designated гол:,:? and called a second- 
degree parabolic correlation coefficient. 

If we substitute X? for X, in [11:56] the 
major determinant [11:86] then becomes 
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520 210 211 222 

ёо Zio Zoo Хос o; 247 ‚ [11:98] 
Eu 201 Хос X 
21, D DE Zos 


and thence the procedure is as before. The cal- 
culation of the variance error, of the parabolic 
correlation, and of the regression constants А, 
B, and C follows the pattern of [11:93], [11:94], 
[11:90], [11:91], and [11:92], but the variances 
of B and C do not follow the pattern of b, and 
b, [11:19]. С could be written Бу? запа read, 
"the regression of X, upon Xj for fixed values 
of X,." However, when X, is fixed of course X? 
is also fixed. We are unable to determine the 
variance of b,, ү without fixing bj, ;, (see 
Chapter XII), but now a different and more power- 
ful procedure is open to us. 

When X, is estimated from a knowledge of X}, 
we have [11:97] and we can analyze V, into inde- 
pendent parts, 


1 05: 2 EC Ec. 1:99) 
N-1 =1 + (N-2) Corresponding d.o.f. equation [11:100] 


Уолі is the variance of X, dependent upon X, and 
equals Уго,. Урд: is the variance of the points 
on the linear regression line and V, , is the 
variance of the divergences from these points. 

In the case of second-degree parabolic regres- 
sion we have [11:96] and we can analyze V, into 
the following independent parts: 


V; Ида, 12 +. 1,12 & o-onOw 1101] 


№1 = 2+ (N-3) corresponding d.o.f. equation [11:102] 
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Усл:,:2 is the variance of the points on the 
second-degree parabolic regression line and 
9.1,12 18 the variance of the divergences from 
these points. 

Also К ОЕ [11:99] can be analyzed into 
independent parts. А part which can be estimated 
from X? but not from X, and a part which cannot 


be so estimated: 


Poa Voo IALA O = 


т Аи S . [11:103] 


Substituting in [11:99] we have 


Vo = Voti = Vio.) Ata2.2) ЧУ, 1,12 
Зүү +V (ris, 12775) ҮУ,Олгоду, 12) (11:104] 


W-1 = 1 + 1 + (N-3) d.o.f. equation [11:105] 


Ул: = the variance of the points on the 
linear regression line. 

Vis.i)A( 2.3) = the variance of the differences 
between the points on the second-degree line and 
the points on the linear line. 

Vo.3,12 - the residual variance, or that of 
the divergences from the second-degree regression 


line. 
The variance ratio testof the needofa linear 


regression line, —b in [11:97] #0, is 


2 
Tog N22 
F PORA F to test need of at [11:106] 
1(N-2) 2 least linear regression 9 
E EST 


Again, we have 
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(ода, 1271012 52 
ONES Gest aap 


F [11: 107] 


[vem 
1 Гоа, 12 


F to test need of at least second-degree regression 


Similarly, 


M (r$5,12337 £5, 12)(№-4) 


Fia = 


[11:108] 
1 - соду, 1223 


F to test need of at least third-degree regression 


And so forth to any desired degree of parabolic 
regression that one may desire to test. 


If the X, variable is grouped into k classes, 
a parabola of degree (k-1) will pass exactly 
through the means of the arrays so that a para- 
bola of higher degree will give no better fit. 
The F test will show this, but prior to this, if 
successively higher-degree parabolas have been 
employed, a P-from-F > ‚5 тау have been obtained. 
This would indicate that the degree of the para- 
bola used was too high, —a better fit being 
gotten than was to be expected from the size of 
the sample dealt with. When successive tests are 
made, one should stop with the parabola of such 
degree that P from F at this point is approxi- 
mately equal to .5, 


Another procedure, leadingto the same outcome 
and frequently less time-consuming, is to compute 
€“, the unbiased correlation ratio squared, and 
terminate the testingofhigher and higher-degree 
parabolas when the r^ obtained is negligibly less 
than «€^, Practical considerations might assert 
that an го of, say, .800, was negligibly less 
than an «^ of .802, even though the sample was so 
large that P from F at this point < ‚5, 
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SECTION 5. — THE CORRELATION RATIO 


This method is of specific value in testing 
the adequacy of linear and other regression 
lines. Also, it is a very general method in that 
it applies when one variable is quantitative and 
the other qualitative. 

In the case of linear regression, the pre- 
dictand, Хо, can be expressed as the sum of the 
prediction, Xo), and an error of estimate, X, ;, 
The four equations following have already been 
established: 


X, = Хол + Жо. 1 


У, = У, ar + ЦУГ АННЫ of variance 
№1 = 1 + (N-2) 0.0.f. equation 


гї = года = Уод Vo 
For quadric regression we have: 


X; = Худі,12 + №, 1,12 


< 
о 
1 


= У а, 12 + Мо,1,12 


ж 2 + (1-3) 


гал1,12 Е Von ,12/ Vo 


Similarly for higher parabolic regression up to 
that of К-1 degree, at which point а perfect fit 
is attained, that is, the regression line passes 
exactly through the means of the k successive 
arrays. At this point we have: 
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Ход, 12. 00a ИВ 1,12:..10-1 


У а 05:17:12: :1571 


№-1 (к-1) Жэ ВЭ) 


Толь,12,,.1573 7 Voi a2, 1573 / Vo 


The variance Убду,ї2,,,18571 is simply that of 


the means of the X,’s for the К successive arrays 
of X,'s. Designating these means M,,, in which 
i=l, 2,...k, and the squared correlation coef- 
ficient in this case of perfect fit ER we have 


2 УИ, ) i У... ак [ | 
Noy = = 11:109 
Е у, Vo 


The raw correlation ratio squared 


To obtain V(M,,) compute the К means of the arrays, 
weight each by the number of cases in the array, 
and compute the variance. 

The numerical values, if there are any, that 
attach to the k class indexes of the X,'s have 
not entered into this computation so no ordered 
or quantitative relationship between the X, 
classes enters into this measure. Accordingly 
the correlation ratio is applicable when a quan- 
titative relationship between the X, classes is 
unknown, 

Also, of course, if there is a known quanti- 
tative relationship between the X, classes, the 
employment of 7^ uses but a part of the avail- 
able information. 

) It is not to be expected that any regression 
line of less degree than k-] will exactly hit the 
means of all the arrays. Accordingly yos 
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VoA1,12: VoAi,1233*:-VoAi,i2...1*7? constitute 
increasing series whose limit is V(Mo,). Also 
2 . 
галі, ToA1,129°**FOA1,12...1*72 constitute an 
increasing series whose limit is 721. The dif- 
ference between 72; and голу, 12,,.1] is a measure 
of the adequacy of the j-degree regression line 
to account for the relationship between X; and 


However, when we bear in mind that the rela- 
tionship that we seek to discover is that main- 
taining in the population and not that in the 
sample we must believe that a regression line 
that fits the sample perfectly overstates the 
intimacy of the relationship in the population. 
We should therefore discount, or shrink, both r 
and 7° to get estimates of the population values 
before we compare them. Both г? and 7? are equal 
to expressions of the sort 


= У (errors of estimate) 
Vo 


A similar expression using estimates of the 
population variances will yield the desired 


shrunken values of r^ and 72, which we label are 


and e? For estimated population variances we 
have: 


ү, FV, Aue dram nb A Sean 161091 


НЭГТ: 


[11:111] 
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я N ч 
= [11:112] 
Ў = 2 [ni 
Veer aie a on CIS tv Mtr : 


Accordingly the shrunken values of the squared 
correlations are 


PODES -1 


гоо г 
$01 = од: = N-2 


(11:114] 


Correction in 1? for shrInkages- 
linear regression, cf (12:361 


(®-1)гбл, 2 -2 
EN атта г 


N-3 


Correction in г for shrinkage,- 
quadric regression 


(11:115] 


(1-1) roar 12...11] 
Я в. 1! oh ==  — [11:116] 


Correction in т? for ѕһгіпкаде, ~ мы 
regression of Jth degree, cf (12:36) 


(№1) 7. -&+1 Nel 
Bay ee SES Hon 1:11] 


Correction in T for shrinkage,- K classes (Kelley, 
1935 and Peters and Van Voorhis, 1940, Ch. X! and 
XIII), also called correction for fine categories 


If a series of г? values determined from linear, 
quadric, cubic, etc. regressions are computed 
that regression yielding an E most nearly, equal 
to Е? is the optimal regression and the ee cor- 
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responding is optimal. For n «e? the regres- 
sion employed does not get all that is of signif- 
icance, and for FREI the regression used has 
presumably profited by chance. 

We may also test any obtained di by the var- 
iance ratio 


(22 =r? 4.80 j)/(k-j-1) 
Е АК а s [11:118] 
(1-72 , )/ (N-k) 


2 1 
To test adequacy of «То/у,12;..1] 


Fy-j-1,N-k 


As here employed j equals the degree of the par- 
abolic regression employed and k is the number 
of classes in the predictor variable. 

For some purposes the variance error aires), 
or of є will suffice. As derived by Kelley (op. 
cit.) these are 


(1-e?)? 2(k-1) 


N-l : N-k 


ЗҮДЄМ 13072177: 1419) 


У( Е?) = 


Vie?) 
у. = 5 (Valid only when Е? isnot small) [11:120] 
4e 


Multiple and partial correlation ratios may 
be computed for multivariate categorical series. 
A three-variable illustration is given by Kelley 
in Rietz Handbook (1924). 


СНАРТЕВ ХЇЇ 


THE GENERAL MULTIPLE LINEAR 
REGRESSION PROBLEM 


SECTION 1. A STATEMENT OF RELATIONSHIPS IN CONNECTION 
WITH STANDARD SCORE VARIABLES 


Let it be desired to estimate X, from a know- 
ledge of n independent measures, x ЦҮНХ ‚Хи. 
The usual notation will be used for variances, 
standard deviations, correlations, covariances. 
Standard scores are defined as earlier and de- 
signated with the letter z: z = (X-M)/c. The 
desired linear equation of estimate, or regres- 
sion equation, is 


Y, "Худ... at Б,Х, b,X,*...* bX, [12:01] 


Multiple regression equation 


When involving deviation from the mean scores 
this becomes 


Ху" Холто, л" ВХ: +b,x,4...¢ Bx, [12:02] 
and when employing standard scores it becomes 


XXI Biz, + Boro unt Biz. [12:03] 


Standard score form of regression equation 


454 
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in which there is no constant term, for were 
there one the estimate of 2, corresponding to the 
mean of the right-hand member, treatedasa single 


variable, would not equal zero, which we know 
must be the case from the two-variable problem. 


a =H, - bH, = bine b, «+» + [12:05] 


When [12:03] is employed the error of an estimate 
IS Tolg. vin" 


Zaans се EOU ER ATAT Sn 
= 25-62) - 6,227: ::-В,2. [12:06] 


The variance error of estimate, which is to be 
minimal, by the proper selection of the Д’з, is 


0.12... n РУ 15... n3 ^1 + Bit & +... 

+ 82-08, гол... 28,0, + Erber rat: + 12:07] 
In the two-variable problem, involving 2, and 

y, the y being a variable whose mean is zero and 


whose variance is V,, we have 


1 
20ду = Foy and 
y 


ЛЕ АА Г pe Kr . [12:08] 


Accordingly, by treating the entire right-hand 
member of [12:03] as a single variable, ме obtain 


Жди ыз" кдте гк [12:09] 
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As with the two-variable problem, 


xo x AM па и [12:10] 


in which the two terms in the right-hand member 
are independent, so that, computing the variance 
of z, yields 


1 = годаг,..а tA annuus coL 12111 


The errorof estimate (20-20 12...„) is uncor- 
related with z,, for otherwise z,, already used 
in the regression equation [12:03], could be em- 
ployed again to reduce the error of estimate, but 
this is impossible because the /'sin [12:03] are 
such as to yield a minimal variance error of 
estimate, We thus have, for the covariance be- 
tween z, and Ап 


<[( 20-20 Д12...п) 21] 7 6(252,) Cl Zoro, , 121) 70 


Е ЛЕ вые [12:12] 
А1во 
С (2012...п2: } =Clz, (8;2,+8,2,+...+8.2, )] 


= 8,У(2()38,с(2ү2,18...38,с(гү2,) 
or 
Fo =A, г, „В, +1,,8;+...4r,,8, (12: 13a] 


Similarly 


To? =F 28, + б, t 1,48, *.. tr, B, [12:135] 


Fon = Губа tT zaba tanas LB, 112:13л) 


Р ЗЧ ОРС ЗЕЕ 
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These п equations are the "condition equations" ог 
"normal equations." Solved simultaneously, they 
yield the requisite values of the B's. 

Thence substitution in [12:04] and [12:05] 
provides the constants of equation [12:01], which 
is the form needed for practical use. 

г? \12...9 can be computed from Ks 15...17 
uzi, which can be gotten from [12:07], or, 
very expeditiously, from [12:14] following: 

We note that 


C(ZsZoA12. . п) CIzo (P125 * P5 22+. + 225291 
“гу В1+Го 282+... *ro В, 


but since, noting [12:08], 


5 232 
с(2,7 ) =o, с =г 
070A12...n 2 012... 001128 0/2... 


we have 


Л sae toler nee е TUS го, [12:14] 


Many methods have been devised for solving 
the simultaneous equations [12:13]. Among them 
are the following, whichhave distinctive features 
of one sort or another: The determinantal solu- 
tion, as given in Section 3 of this chapter, 
parallels and reveals theoretical development 
and is not uneconomical if four or fewer varia- 
bles are involved. The Doolittle (1878) method, 
(see also Dwyer, 1941, Doolittle and, 1941, Solu- 
tion), one of the many variations of which is 
given in the next section, (This modification was 
suggested in part by an article by Cowden, 1943) 
is generally serviceable, though it becomes 
rather laborious with 20 variables or more. 
Aitken's (1937) method of pivotal condensation 
has much similarity with and many of the merits 
of Doolittle procedure. The Kelley-Salisbury 
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(1926), (See also Kelley-McNemar, 1929, and 
Tolley-Ezekiel, 1927), or other iterative proce- 
dure, is designed for the many variable problem. 


SECTION 2, A MODIFIED DOOLITTLE SOLUTION 


We here give a four-variable problem solved 
by a modified Doolittle method, expressed in 
terms of symbols so that what has happened at 
every step is apparent. The multiple-regression 
problem involving more than four variables does 
not involve any principles not covered in the 
four-variable illustration. 

A Doolittle solution based upon gross scores 
can be made. This would immediately yield the 
equation [12:01]. There are two disadvantages 
in this procedure: It involves one more varia- 
ble, namely a, than a calculation based upon de- 
viation scores, and it yields an answer [12:01] 
which holds but part of the information commonly 
desired, for a knowledge of means and variances 
is frequently pertinent to a problem in addi- 
tion to knowledge of the regression equation. 
It is therefore judged to be generally economical 
to compute means, variances, and covariances in 
the first instance and then to perform a Doo- 
little solution with these measures, leading to 
equation [12:02]. Adding the constant a, [12:05] 
yields [12:01] which is generally the most serv- 
iceable form of the regression equation for 
practical use. It is readily established that 
the condition equations when deviation from mean 
scores are used differ slightly from those con- 
sequent to standard scores, [12:13], and are, 
for the four-variable problem, as follows: 


(1) БУ, + b Cit b,C,,7 Суу cf Table XII A row 1 


(2) b,c,5* b,V, + b,C,,7 Cop cf.Table XII А row 20 


— nn 
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(3) b,c,* b,c, 5+ b,V,- Суу cfTable XII А row 30 


In view of the fact that the coefficients of the 
unknown b's enter symmetrically (a situation 
leading to grammian determinants) the solution 
of these particular simultaneous equations be- 
comes relatively simple. In the modified Doo- 
little layout of Table XII A the coefficients 
by, bj, Бу» and the equality sign have been 
dropped and certain elements added (arising from 
relationships in matrix algebra connected with 
the identity matrix). The student can identify 
the steps of the Doolittle process by reference 
to steps (4) and (9) herewith. The Doolittle 
method is simply a convenient arrangement of 
steps for the solution of simultaneous equations 
in which the coefficients of the unknowns are 
symmetrical. Dividing the terms of equation (1) 
by (-V,) we obtain 


(4) =b; = Ь,5,, = bb; 5 “Ву cf. Table XITA row 17 


Multiply (1) by the second coefficient of (4), 
namely-b,;, obtaining 


(5а) 7b V,b,,75,6 2b, 1-561 3b21= 7% 1021 
(8b) -Б,е;, - bri V;- Бугууг123027 ~ To 14972 


which, using a notation in which ^ in a subscript 
means "dependent upon" is 


(5) -Ьс,,- b Бом = Со cf. Table XII А row 21 


Adding (2) and (5) yields an equation which is 
void of the b, term, 


(ба) b,(V,-V,n,)+b,( C5 3A) 602702, 


which, using a dot in a subscript to mean "inde- 
pendent of" may be written 


(6) БУ, у + ,С,3 у = Coz.1 cf Table XIT A row 2 
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Dividing by (-V, у) we have, 


(7) -5, bb, 5-8, 


1 
52.1 02.1 cf. Table XII A row 2 


Multiplying (1) by the third coefficient of equa- 
tion (4), namely (7541) we obtain, 


(8) с; 7 bie 7 b Ay =—003л1С Tab le KITA row 31 


Adding (3) and (8) gives a second equation void 
of the b, term 


(9) b,c. я V. =e cf. Table ХІТ А sum of rows 30 
2€23. 1 ЬУ, 03.1 бааа 


Equation (6) and (9) represent a three-variable 
problem in which the coefficients of the b's are 
symmetrical. Thus we can solve these in a simi- 
lar manner, obtaining a final equation yielding 
b, which in the fuller notation of Table XII A 
is Боз, 12: 

Table XII A gives a Doolittle solution for a 
four-variable problem in algebraic notation, 
modified to give all the regression coefficients, 
thus obviating a "backward solution," and to 
give additional essential constants in rows В 
and C. When taking the square rootof the squared 
partial correlation coefficients the signs of 
the corresponding regression coefficients in row 
0 must be attached. When computing the multiple 
correlation from the relationship 


y, 
2 С £ 0.123 
TA 23 — 10.123 al 


(See rows 0 and 00) [12:15] 
0 


the sign is always positive. Аз illustration of 
the dot and delta notation we note 


V, > Yom + Yous (see [10:16]) 


That is, the variance of x, may be set equal to 
the variance of a part which is dependent upon 
x, plus the variance of a part which is indepen- 


ee ——— ———— 
== р amid - 


аа 
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dent of хү. And again, 


Goa SECO ASI Сат, SN HOO ese . [12:16] 
or the covariance c,, is equal to a part which 
is dependent upon x, plus a part which is inde- 
pendent of x,. Since 

Coq аа гооо [12:17] 
[12:16] may be written 

г0100017 Го212 9091 + (793 -Го2Г122909а, 
so that 


ruit enfe o qd SO pee [12:18] 


The addition of any number of the same second- 
ary subscripts to each symbol does not change 
this relationship. 


бол 


у [12:19] 


epe Mme 
1 
The addition of any number of the same second- 


ary subscripts does not change this relationship. 
Bob bog = Dome п ое . [12:20] 


The addition of similar secondary subscripts to 
every symbol does not change this relationship. 

The condition equations whose solution is 
provided for in Table XII A are numbers (ЛУУ (Car 
and (3). For future reference we add a symmet- 
rical fourth row and write the coefficients of 
the unknowns in the accompanying augmented ma- 
trix: 


:20] 


=[12 
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£25 «610 ony 
EN A “nns 


NWN109 
КЕЕ РЕ] 


8318815үЛ h 90110108 311111000 0313100и NI 031330 Ls 30 NEN 31 108HAS 
V IIX TIEL 
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Q30NI1NO2) 


M IX aTAVL 


бат! АА 


0 
єт 


zT бт FAA 


19 А 


£635 (205 «105. 


9NIGn12NI WAS 


№1105 
39342 


til NAO102 


11 №1105 81111172) 


| 464 MULTIPLE REGRESSION [12: 21] 


Vi с.) Sig бох 
су, V; €23 СС 
B eise 0 [12:21] 
Сү, с; у, Суу 
“Cor Coz "©, Vy 


In this arrangement the row and column connected 
with the criterion variable are placed last and 
the student will find it generally advantageous 
to arrange the predictor variables so that they 
are in an order, X}, x,, X,, judged to be in de- 
creasing order of importance as predictors. The 
advantage of this order is twofold. First, the 
crucial regression coefficient to test for sig- 
nificance is then b,, ,,, which is the easiest 
to obtain. Precisely to test for the signifi- 
cance of b,, ү, one requires the few computa- 
tions of column III. Those of columns I and Il 
can be made later or not at all, as desired, de- 
pending upon the outcome of the test for signi- 
ficance of Б, ,,. Second, should b,, ,, prove 
nonsignificant one would then desire the regres- 
sion equation omitting X, and, it will be noted, 
most of the computational work for this is al- 
ready at hand in the columns and rows of Table 


XII A that do not involve XS. 


In the cell at the intersection of row 1’ and 
column x, is the entry (21), which is to say 
that the numerical value found for this cell 
need not be recorded in this cell, it being in 
a more useful position if recorded in the multi- 
plier column, row 21. Similarly for (31), (01), 
(32), etc. 

Tables XII B and XII C provide the numerical 
data for the four-variable sample problem solved 
in Table XII E. 
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TABLE XII B 


SAMPLE 4-VARIABLE MULTIPLE CORRELATION PROBLEM 
Means and Variances,~ N = 125 


Х, Х, x № 
И 52.20 43.60 45.40 68. 15 
ү 92.80 149.92 203.08 110.32 
c 9.63 12.24 14.25 10.50 
ее 
| TABLE XII C 


AUGMENTED VARIANCE AND COVARIANCE MATRIX 
(c, is entered) 


X X, х, х, 
х, 92.80 47.62 31.70 -27.7l 
х, 149.92 10.47 -28.92 2 
х, 203.08 -20.05 
х, 110.32 
TABLE XII D 


AUGMENTED CORRELATION MATRIX 
Cr is entered) 


1,000 «404 -231 —.274 
1.000 „060 7.225 
1.000 —.134 


1,000 
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{! 022*081 
80217081 


0098°h 
L6ZE* 6h- 


0002°S2z 


XXX 
1886"101 

S “hoI 
thol *hz- 
00607621 


192000” 


461910" 


46190"- 


£061007- | 


108896" 


901 620” 
969145" 


o 


680100” 


891618" 
891 6187 


(50) a os 
цал ) 
$895 *11— | 9£86*1 61 


1629*- | 897°- 4619h0* 
9S9h*6 | 9828'0 l- 469111 87- 


0090"02- | 00807802 


9002"11-| 0Z6Z*S- | 4881 "621 
98612711 


(10) |09 (2) 
0012*42-|00004*1£ | 002971 


S318V|NVA h--“NOILNTOS 1У214ЗИЙН 311111000 0313100 
Я ИХ ATaVL 
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22212 


[12 


(030N1 1NO2) 


Я IIX TIHVI 


989670" 
20 ТАЛТ: 
Zon! 07- 


£4h000 '- 
969000 "= 


22600 * 211800"- 
$80000 "= | ^h1000'- 


602900*- | 110000°- 
696200 °- 


802100" 
(214437 Єт *20, 
g- 


2) | essei 2} 


(8299766 
6299°66 | 699890 


4209*01 
6862 "51 
8121781- 
00197668 


699880"- |012200”- 
0” 881211”- 
0” 0" 
0” 0" 


699880" 
881211" 
809862" 


NWNT09 
323482 


1 11 
№1705 NWN109 NWN109 
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It is to be noted that the check column applies 
to the work between the double vertical rules. 
To obtain a check of the b's it is necessary to 
Substitute them in oneof the condition equations 
and see if it reduces to zero. This is most 
readily done employing the constants ofthe first 
condition equation, as recorded in row 10 of 
Table XII E. In symbols the equation is 


Б.И, + Бо. 15€, + В. 12613 7 Co1= 0 


Thus we have 
.217055 X 92.80 + .119858 X 47.62 + .058671 
KIO = 277: = 0002 


This being zero within the accuracy of the com- 
putational procedure constitutes a check. 

There is no independent check upon the compu- 
tational work below row O, so it is important 
that this be performed with extra caution. 

The multiple correlation squared, as given by 
[12:15], is.0966, but this has a bias and should 
be corrected for shrinkage by [12:36]. The cor- 
rected value is .0742, or sToA123 = +2724. Since 


the single correlation Го: (see Table XII C) 
equals .274, it is futile to use all three pre- 
dictors in this particular problem, but we shall 
Proceed to apply precise tests to the multiple 
correlation coefficient and to the regression 
coefficients. Utilizing [12:37], we have 


F _ :0966 x 121 

2r cl О 

This being greatly in excess of 1.0, we may be 
sure that rj4,,, is not a chance deviation from 
zero. The computation of P for this variance 


ratio will only confirm this belief. We do, in 
fact, find that P - .038. 


7 4.3128 
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To test the significance of a multiple-re- 
gression coefficient we have a variance ratio of 
the sort [10:78] by adding secondary subscripts 
and noting that the error variance has (N-n-1) 
degrees of freedom. 


Pts 12. а, eee (W-n-1) 

F (н-п-1) 12:22] 
See also 

[12:24] 


For the two-variable problem we have 


Ио Мо 


Vs о 


M 
l-ro; 


Accordingly, adding secondary subscripts 
.)i(...2 throughout, we obtain 


ота ре 
ETE 4. 15 [12:23] 


еа derum 


Substituting in [12:22] yields 


Ко шыг? У.ол2...)1(...л (1-1) 
rai О [12:24] 


Since the negative of the reciprocal of the num- 
erator variance is given in row B and the square 
of the partial correlation coefficient in row C, 
all the necessary constants are available to 
apply this test. Also, since this is a variance 
ratio having one d.o.f. in the numerator, its 
square rootisa critical ratio and, if the d.o. f. 
of the denominator is not small, then the square 
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root of this Ё may be treated as a deviate in а 
unit normal distribution. р 
The variance ratio to test if Боз. 12 isa 
chance deviation from zero is 
(.058671)? x 121 


БИИ ОН gyn drs 
Fi,121 ^ 00524499. 6628Х_ 993325 


УР = .8957 and P = , 370. 


1,121 


Similarly for Бог, 13 we obtain P = .139, and for 
bo1.23 we obtain P - .040. 

It is equally effective to test the partial 
correlation coefficient as a chance deviation 
from zero as totest the partial regression coef- 
ficientasa chance deviation from zero. We have 
Гоз.12 = У.006674 = .0817. Using the Fisher r- 
into-z transformation, we obtain 203.12 = .08188. 
This has a standard error of .09129 (= 1/7121 -1). 
The critical ratio is .8969 and to three decimal 
places P -.370, which is equal to that previously 
obtained for bo5.12 as a chance deviation from 
zero. These two tests are alternatives. The r- 
into-z procedure gives a P of .142 for T92.13 
which is to be compared with the previous value 
.139. Similarly for 791,23 We obtain P = .042, 
to be compared with .040. 

For the significance of Qf, -3,) we have a 
test similar to [10:38]. We note that (п+1) 
linear restrictions have been imposed upon the 
X, measures. They are the restrictions inherent 


in №, Сүү, Соз»... Thus we have 


— 
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~ Variance ratio 
T (44, -4,)? (N-n~1) test of У, 
LETERE S 7 ——À— ———— — — inn the газа 12:25] 


V of'multiply 


ЖИ correlated data 


SECTION 3. THE DETERMINANTAL EXPRESSION OF THE SOLUTION 
OF NORMAL EQUATIONS 


In connection with the condition equations 
[1 2:13] let us write down the augmented correla- 
tion major determinant А, or^, "This corresponds 
to [12:21] except that the variables here are 
standard scores. 


1 Fig 5 Ina ei 
Tjj 1 * * Fon Тоо 
A=A= ; 5 s А vee [12:20] 
Fax The 1 Ton 
Гуд”? Lops ones one ae 


The following relationships hold, in which, of 
course, ^); is the determinant obtained by cross- 
ing out the last row and the variable i column of 
the major determinant А, and 4,, = (-1)"*!'**A),. 


A 
ко Se gaan На 
0.12... A 
A 
CaN ERIS con PE 12:20] 
Aoo 
5 Ayr a 2000 й» Ay; №з F 
$ Ч ЎАР дүрт алдлан = ресс 
: Ayo So А А. 


472 MULTIPLE REGRESSION [12:29] 


-A А 


Bi = ot = (=1)"+1+1 “ot [12:29] 
ЫТ; Aoo 


и [12:30] 


Вот. 12... )1(...п 13 the expanded notation for 5;, 
as given by [12:29]. The reversed parentheses, 
)i(, indicate the i is omitted in the series of 
secondary subscripts 1,2,...,n. Since 22152410) 
and the sign factor for the position (0,1) is 
the same as for the position (1,0), we can write 


( 


AC. AN The squared 
2 Za. E ра а| 9.3 
У 2:34] 
00911 “00511 coefficient 


When extracting the square root to obtain 


201212:/.11(..-0 


the sign is the sign of ЗА,ү, ог in other words 
it is the sign of f£. 

Since f, vanishes with this partial correla- 
tion, che significance of 8, as a deviation from 
zero is exactly the significance of 


2201, MU шз 


аз a deviation from zero. The distribution of a 
partial correlation coefficient is the same as 
that of any other correlation coefficient, when 
proper allowance is made for its reduced degrees 
of freedom. The total correlation coefficient, 
гоз, has (4-2) degrees of freedom; its variance 
error is (l-rj,)*/(N-2), as given Бу [10:46], of 
Chapter X; and when transformed into a Fisher z, 
see [10:43], it hasa variance error, see [10:44], 
of 1/(N-3). Accordingly, in the case of this 
partial correlation in this (091) variable prob- 
lem, 


РРР .E,oqmz: 


М“: 
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АЕ Е оао Не о В [12:32] 


Variance error of partial Г 


Variance error of 


partial Z [12:33] 


V(z from partial r)= n 


Formula [12:32] gives the variance error of 
the correlation between X, and X, for fixed 
values of X, Хуу. This is to SAY that the 
sample in question is one of an infinite number, 
all of which have the identical Х, » Х,,...,Х, 
values and in the identical pairings, fa with Х., 
X, with X,, etc., as in this вета sample. 
Thus the only variables which change from sample 
to sample are X, and X,. The partial correla- 
tion Гу vn? read "the correlation between 
X, ad ў; “$55 fixed values of variables Х, (X, 

E, *ihus no meaning unless Х,, duod, are 
fixed, This рага correlation i is indeterminate 
unless X, ‚Х,,.. ‚Х, have fixed values, 50 no 
formula based on hs correlations in a single 
sample, for the variance error of a partial cor- 
relation coefficient (or of a regression coeffi- 
cient in a multiple variable problem) is pos- 
sible under totally free sampling. 

Should we have a number of totally free sam- 
plings and compute an Гої,23,,,л for each, these 
partial r's would vary in meaning and would vary 
in magnitude and their variance error could be 
computed, but, so far as the writer is aware, 
this has never been done. We must therefore 
look upon the partial correlation coefficient 
(and the multiple regression coefficient) as 
having a known variance error only when all 
variables, except the two primary ones in ques- 
tion, have fixed values. 

The variance of b,, as given by [12:34] is 


474 MULTIPLE REGRESSION [12:34] 


its variance when all variables except Х and X, 
have the fixed values of the observed sample, 
thus by parity with [11:79], 


ООо _____ 
1.12... 


2 
Vol 15... Variance 12 34] 
EN POPE Or a multiple : 
И? LIN SIME „(ї-п-1) regression 

руе 34 P coefficient 


The divergence of the observed value, b,, from 
any theoretical value, b,, is to be tested by 
the variance ratio 


(bi-b,* ат ЭР ин?) 
Py yen) = 3 [12:35] 
ук. d AT (see also [12:24] ) 


When, as is common in multiple regression prob- 


lems, (N-n-1) is large, VF is sensibly 


10Н-п- 21) 
the same as the x deviate in a unit normal dis- 
tribution, so that double the area to the right 
of х is a close approximation to the variance 
ratio P, 

The multiple correlation coefficient is a 
biased measure, being larger then is to be an- 
ticipated if the obtained regression is applied 
to a new sample and the correlation with a simi- 
lar criterion computed. The amount of this bias 
is known, (see Fisher, 1923, and Wherry, 1939), 


and the following formula applies, in which ,r* 


is r^ corrected for shrinkage: 
“(31-14 -n 
Missi Е [12:35] 


Н-т-1 
Giving correction їп г? for shrinkage in the (n+l) variable 
probtem, cf. [11:116] 
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The limiting values of multiple r are 0 andl, 
so the form of distribution is not identical 
with that of a total correlation coefficient, so 
we shall not use the r-into-z technique, of Chap- 
ter X. We may compute a variance ratio, similar 
to [10:42], to test the hypothesis that true mul- 
tiple r^ is equal to 0. Corresponding to the 
variance equation, 


e — 
Vo VoA32: л! 0:12, 487907 0ду2 een Vo Тодд 


is the degree of freedom equation, 
(№1) = n + (N-n-1) 


The multiple correlation coefficient has as many 
degrees of freedom as independent variables, n, 
and the error variance has N-n-l, so that, 


НЫ) F to test hypothe- 


sis that [12:37] 
(голо, . „Хп) Гоо... 0 


Рип = 


In addition to the relationships of this Sec- 
tion, further relationships are neatly expressed 
by means of matrices, as explained in Section 4. 


SECTION 4. THE USE OF MATRICES IN EXPRESSING MULTIPLE 
CORRELATION RELATIONSHIPS 


The treatment herewith assumes the elementary 
familiarity with matrices and determinants rep- 
resented by the discussion of Chapter XIV, Sec- 
tion 2. The illustration herewith is based upon 
a four-variable problem, but the relationships 
given hold for a problem with any number of var- 
iables. Were there a simple way to evaluate de- 
terminants of fairly high order, the solution of 
regression equations by means of matrices and 
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determinants would undoubtedly be the standard 
method, for it brings into clear focus the under- 
lying algebraic and geometric relationships. We 
start with an augmented matrix of type [12:38]. 
For condition equations [12:13] this is 


1 ria Ой т 
г № 3 Ё 
12 23, n» 
@ = Ашдтеп ед matrix [12:38] 
гүз г; 1 гуз 


The transpose of @ is @' and its inverse is (77. 
We will designate the elements enterine into А 
as aà,,. The cofactor of ар; is ШҮҮ in which i 
refers to the row and j to the column. When 0, 
Q', and (`` are treated as determinants they are 
designated A, А’, and 4^!, The elements of 077 
(or of 473) are ај! (at the intersection of the 
j row and the i column). 


, A 
al! ЯДА cements of the inverse matrix G+ [12:39] 


When the matrix is of predictor variables only 
we use a notation similar to the preceding, sub- 
Stituting Ё for A, Thus, 


1 Е 

Ё = Түр 1 P534 Predictor matrix [12: 40] 
Tis T 1 

The meanings of 8”, 2, R, R', R-2, гүү the 


element in R (or R), Ёү, its cofactor, гі!, 
the element in Ё- (or R>), will be obvious. 
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Rr 
г)! ЕН Elements of the inverse matrix tt [12:41] 
R 


The vector matrix of regression coefficients 


is f, 


8, Por. 23 
8 518. |= |А; Sey eae oO Gk 2740] 
ё, Po3.12 


The vector matrix of correlations with the 
predictand is л, 


№ = || Го and transpose ^,' = ЇГ 


01 Тог Гоз 


[12:431 


T93 


The following relationships, which hold for 
any number of variables, will not be proven, but 
merely illustrated with the four-variable data 
of Section 2. 


BzR л, 


Yielding the matrix of 8 ee 22447 


sion coefficients 


A 
20270565 Ко 1232р тэ 


tiarra Әл В еа «ше 25461 


ii 2 
2 Ко. 123 Variance error of а Д 
m———— — muitiple regression [12:47] 
81 N-4 coefficient,-4 variables 
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For the general case we have 


г\ ік Variance error of а mi- 
= 0.12...п tiple regression coeffi- [12:48] 
vs cient,-(n*1l tee D 
jJ N-r-1 (See also [12:34)) 
A Ry 


ео. [12: 48a] 
Е?(М-т-1) 
A derivation of this follows directly from 
the ©, formula on page 600 of Fisher (1922). 
We also have 


Variance error of a mul- 


D ичиг 12:49] 


The significance of the difference between 
two of the Д coefficients entering into a single 
regression equation is sometimes desired. These 
are not independent, so a covariance term is 
present. The general equation is: 


(т' 1+1 -0г11) KS in., 


Man =i 


Variance error of difference between Ë fegression coeffi- 
cients, -(п+1) variables 


WB-£;) = [12:50] 


Further 


Е 2 
Е м (3, 8 ) Variance DAS erro [12 51] 
p he n < : 
Dem! ИВ) cance ot TA) 


Inherent in this equation is the relationship 
г!) 


а VETT HT Xplore. [12:52] 


Since the criterion variable does not enter 
into the right-hand member we note that the cor- 
relation between regression coefficients is in- 
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dependent of the criterion, or predictand. 
Under conditions of sampling such that V, and 
V do not vary from sample to sample, we have 


Correlation between regres- 
их sion coefficients (12: 53] 
m 


The data of the four-variable problem used in 
the modified Doolittle solution is now employed 
to illustrate the solution by matrices and de- 
terminants. 

Using the values given in the augmented cor- 
relation matrix 4, Table XII D, we evaluate as a 
determinant and find that А = .714545, 

The Predictor Matrix R is 


(1) (2) (3) 
1.000 . 404 ‚ 231 
In which the ele- 
В = .404 1.000 .060 эээ PM desig- 
. 231 .060 1.000 


Evaluating asa determinant weobtain К = .791022. 
Utilizing [12:28] weobtain глу,, = .0967, to be 


compared to the value .0966 obtainedin Section 2. 
The cofactors of the elements in R are LOT 
as recorded in the adjoint matrix herewith: 


.996400 7.390140 ~,. 206760 
~. 390140 -9 46639 ‚ 033324 
-. 206760 .033324 .836784 


Dividing each of these Бу К we obtain the el- 
ements of гі! of the inverse matrix R^! herewith: 
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1.259636 -.493210 =. 261383 
Rod = || -. 493210 1. 196729 .04212 
-. 261383 .042128 1.057852 


Matrix multiplication yields the identity matrix 
$, which is useful as a check upon the arithme- 
tic accuracy of the work. 


Ї 1. 000000 „000000 ‚000001 
Е Е-2=] .000000 1.000000 .000000 ||= 3 
. 000001 .000000 1.000000 


The matrix of correlations with the criterion, 
Ху, ISA. 


foi . 274 
= Tos = 1225 
гуз . 134 


The B regression coefficients matrix is 


.199143 | з, „, 8, 
88712451, 193770 | = 1 Вва:15| = |8, 
.079612 | ОД az 8, 
Si 5, To 555 
ince B, = bi z we utilize the c values of 
i 


Table XII B and obtain 
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00 со 
b, = 8) === .217131; b,-B7—-.119901; 
og e 


% 
b, = В, — = .058661 
n 


which check with the values obtained from the 
modified Doolittle solution, for three and four- 
figure o values were employed. 

The г!! values in R^! provide us with the 
necessary information for testing the signifi- 
cance of the difference between two (78, formula 
[12:51]. For example: 


(8, - А,) = .0593 


and 
. 114545 
7 02009080 
р a 
so that 
[1.259636*1.196729-2(-.493210)](.903319) 
В а ее TOS АЫ 
= .0257 
ого 
= .16. 


9184-85 


Accordingly, the difference .0593 is not trust- 
worthy. 


СНАРТЕВ ХЇЇЇ 
SUNDRY STATISTICAL ISSUES AND PROCEDURES 


SECTION 1. EVIDENCE OF PERIODICITY IN SHORT TIME SERIES 
(An abridgment of Kelley, 1943) 


Many time series are of short duration be- 
cause the recording of the event has, of neces- 
sity, proceeded for but a short time. Though it 
may be necessary that the student bide his time 
and wait for the years to pass in order to se- 
cure sufficient data, we should come to this 
conclusion only after having exhausted the pos- 
sibilities of small sample theory. A method 
which applies to short series and which enables 
a judgment of the hypothesis with fiducial 
limits is important in that, whatever the time 
span, the issue can be investigated and the hy- 
pothesis substantiated or rejected within the 
fiducial limits set by the investigator. 

Let the time variable be X, and let X, be the 
variable which we postulate to be a periodic 
function of X,. Before investigating periodici- 
ty we should first take out any linear, quadric, 
or other non-periodic trend that may exist. We 
pause to note that the inclusion of "quadric, or 
other non-periodic" is a broadening, but we be- 


482 


inate 
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lieve justifiable estension, of the usual defin- 
ition of trend as that allowance which has to be 
made for the time such that the residuals shall 
be uncorrelated with the time. 

For the purposes of precise test it is advan- 
tageous to have the trend represented by a line 
that places linear restrictions only upon X,, as 
do the parsholic regression constants of Chapter 
XI, Section 4. Consider the following series of 
estimates of X,: 


Estimate of X, = M... x Regression [13:01] 


Хол: = а +b Х, .... Linear regression [13:92] 
Холл, 1? =A+B Х, +С XT дозага regres- [13:03] 
sion 
2 i 
Холи, 12,13 ®&®+ X, +уХ? + 8X2 gression [13:04] 


ће test the need of [13:02] rather than [13:01], 
that is, we test the hypothesis b = 0, as in 
Chapter XI, Section 4. Ог again we test the 
need of [13:03] rather than [13:02], that is the 
hypothesis C = 0, etc. When no trend is removed 
[13:01] provides the estimate of X, and the quan- 
tities related to X, in which we seek periodici- 
ty are (X, -И,). These have N-1 degrees of 
freedom. If a linear trend is removed by [13:02] 
the residual quantities, having N-2 degrees of 
freedom, in which we seek periodicity are 
Хүүд (= Xy 7 Koan) 

When a quadric trend is removed by [13:03] the 
residuals are Хо, 1,12 and they have //-3 degrees 
of freedom. í 

A similar procedure, equation [13:04], could 
remove a cubic trend, but should be used with 
caution as a cubic is itself a close approxima- 
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tion toa complete cycle of a certain sine curve. 
The cubic curve of Chart XIII I, which cuts the 
quadric much as would a sine curve with a period 
of 16 years, is an illustration of this. 


CHART XIII I 
BIRTHS IN UNITED STATES 1920-1940 


ANNUAL NUMBER OF BIRTHS (п thousands) 


1920 1924 1928 1932 1936 1940 


Using the data in the first two columns of 
Table XIII A, we find a quadric trend, (C of 
[13:03] f 0), and remove the same obtaining re- 
siduals X, , ‚2, which may be investigated for 
periodicity. We employ a periodogram technique 
with, two modifications. The usual periodogram 
is as shown in the dot line of Chart XIII 11. 
The raw correlation ratio squared, n’, is plotted 
for successively greater periods, T. The argu- 
ment is that in the neighborhood of the period 
yielding the largest 7? will be found the actual 
period of the data. For this argument to be 
precise it is necessary that a probability state- 
ment be attached to the outcome. Thus in addi- 
tion to, or instead of, the 7? ordinate in the 
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periodogram we employ the ordinate (1-P), the P 
being that from a variance ratio, the numerator 
variance being that of the means of the classes 
into which the data have been grouped when items 
of the same phase for the period T constitute a 
group, and the denominator variance is the cus- 
tomary residual, or error, variance. 

À briefer process, yielding essentially the 
same picture, asillustrated in Chart XIII II, is 
to use as the ordinate the unbiased correlation 
ratio squared, c^. There lies in є the addi- 
tional value that it gives a quantitative state- 
ment of the correlation between time, treated as 
a periodical variable of period T, and the mag- 
nitudes, Ху 2, being investigated. 


CHART XIII II 
PERIODOGRAM 


3 
Р $САГЕ 


2 4 6 8 [ORE рки 6 18 
PERIOD IN YEARS 


We start with N equally time-spaced observa- 
tions Xy. We find that a quadric trend exists 
by the method of Chapter XI, Section 4, and re- 
move it, giving N residuals, X, ; |2, having N-3 
degrees of freedom. / 


/ 
/ 
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TABLE XIII A 
BIRTHS IN THE UNITED STATES 1920- 1940 
(World Almanac, 1943, p. 472) 


x Mos Tier Жула? PHASES FOR PERIODS 7 | 
1920 | 150887 а а 
! b b 
2 c c 
3 a d 
4 b a e 
5 c b f 
6 a c a g 
7 b 1 с b a 
8 c a 1 с b 
9 а b e d c 
1930 b c a e 4 
! с |9 d S e 
2 a a c a f 
3 DESDE b | g 
4 | 2167636 с с е c a 
5 | 2155105 а 3 а 4 
6 | 2144790 b a b e 
7 | 220333 с b c f 
8 assos ERE. 
9 | 2265558 ЭЛ Р е. | b 
230399 с 


үм» E 


[13:05] - 


(8 | 
а 
b 
c 
4 
e 
f 
g 
h 
a 
b 
c 
1 
е 
+ 
9 
h 
a 
b 
с 
4 
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PERIODICITY 


TABLE XIII A 
BIRTHS IN THE UNITED STATES 1920-1940 
(World Almanac, 1944, p. 472) 


PHASES FOR PERIODS 7 


ж с +оо о єє оо» 
го ө 2 0 т p 


шаг Хэ-г 
— = чы = 


з 


" 20-0 200m» x — 


о +оо о о с о 
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Let the period under investigation be T of 
the equally spaced time intervals. Of necessity 
1 < T < (№2) and the practical limits of utili- 
ty of T, fractional or integral, may be taken 
such that ет (9-2), while if a zero trend, 
equation [13:01], is involved 2 < T < (N-1). 

If М is exactly divisible by T, there are T 
classes each having N/T measures, Ху, і, 12, of 
the same phase in it. If N is not exactly di- 
visible by T, we have a variable number of meas- 
ures from phase to phase and the number of classes 
will be some number which we call k, greater 
than N/T. Let the means of the X, ,, Е measures 
in these classes Бе M,,, "цуг. A m The vari- 
ance of these means, ео weighted with 
the number of measures in each class, is desig- 
nated V(M,,). The mean for each class is inde- 
pendent of that for every other class except 
that the mean of the means is equal to zero, so 
that there are just (k-1) degrees of freedom. 
Letting i take all values from a to k, we can 
express the variable X, , ‚2 as equal to the sum 
of two independent pacto We have the following 
relationships: 


Xe Жи oi е CLS: 05 
N-3 = (k-l) *(N-2-k) 0.0.F. equation [13:06 
Уз? = VOIE, ү 12,1 variance [13:07 
р MD ayog 


Vo., 12, AMR) (У, 1 2M CRD 


Testing periods which are fractional intro- 
duce no complication if proper allowance is made 
for the decrease in N because certain X, values 
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are not used. 
The columns of Table XIII A in order provide: 


Column 1 Year 

Column 2 Number of births 

Column 3 Хол1,12 аз given by [13:03] 

Column 4 0. 1,12, residual deviations to be 
tested for periodicity 

Column 5 Column 4 values opposite а are in 
the same phase and thus constitute 
one class. Values opposite b con- 
stitute the second class when the 
period being investigated is two 
years. 

Columns 6, 7,...,21 are similar for periods 
of 3, 4,...,18 years. 


At the feet of columns 5, 6,...,21 are re- 
corded, in order, 


2 : 2 
7)» Foyer (цэ2-к)5:Г corresponding, ала ёс» 


We first investigate trend. X, = number of 
births per year. X, = date, and for convenience 
we let x, = X,-1930. We find: 


M, = 2050140, and V, = 4617948 X 10" 
Хол: = 2050140 + 30929 x, 


r2 = .15956 


To test the significance of the deviation of 
30929 from zero we compute 


27 rê, (1-2) 


= = 60.022 
Е 


E 


which yields a very small P. We next fit a 
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quadric regression line, obtaining 
Холу, 12 = 2118995 + 30929 x, - 1877.9 xj [13:09] 


Гела, 12 7.84113 


M гол, 12-Й гов! (8-3) 
a - 9.2419 
ар № (1- годі,12) 


Е 


yielding Р = .048. Having odds of 952 to 48 

that -1877.9 is not a chance deviation from zero 

we take [13:09] as the trend line and compute 

residuals 
Хо.1,12 = Xo 7. Moar, 12 

as recorded in column 4 of Table XIII A. 

We give herewith a cubic regression line, not 
as a part of the procedure of determining peri- 
odicity, but because it is interesting to compare 
such a line with the evidence available as to 
periodicity. 


Ході,12,13 = 2118994 + 11149 x, 
- 1877.8 xi + 300.61 x3 
г2А1,12,13 = 89917 


Е [V тол, #?, 37% голу, 12] (№-4) 
= 29,1856 
1,17 К, Gro, 12,13) 


yielding P = .045. This establishes the non- 
chance nature of the coefficient 300.61. We will 
later note that we do not establish periodicity 


bie 
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with the certainty that we here establish the 
fitness of a cubic regression line. 

To test for the existence of a two-year period 

the residuals of column four are tobe grouped 
into two classes, a and b as indicated in column 
5. The a class has 11 measures and yields а 
mean, M,,, and the 10 measures of class b yield 
а mean, M,,. Weighting these means 1] and 10 ге- 
spectively and computing their variance, we obtain 
УСМ), the subscript 2 being К, the number of 
classes involved. Analyzing variance as indicated 
in [13:07] and employing [13:08], we have: Fy ay 
= ,073, yielding Р = .825, which, being > .5, 
yields no evidence whatever that a period of two 
years exists. Similar computations for Т=3, T=4, 
«эхээ. T=18 have been made and are as recorded at 
the feet of the appropriate columns of the table. 
When T = 16, P is .152. Thus the odds are about 
5 to l that a period in the neighborhood of 16 
years exists, The F for this case is an Fi, 
We have but 3 degrees of freedom available in е 
error variance. Should a real period of 16 years 
exist it is not surprising that data covering but 
2] years is unable to establish it with satis- 
factory certainty. 

The advantages of employing (1-P) in lieu of 
T^, in plotting and interpreting a periodogram 
wherein the number of degrees of freedom is small 
is exemplified in Chart XIII II. 

To a degree the advantages of (1-P) are also 
inherent in e^, as is indicated by the periodo- 
gram curve based upon є“. The derivation of 
Chapter XI, Section 5, of Е? gave the relation- 
ship 


m 


N-1 
ЕЕ 5 11:117] 
1 WEE (1-7) ее [ 


but this was postulated upon using а zero order 
regression equation [13:01]. When a linear trend 
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is taken out the relationship is 


N-2 
Wella 


e=al- LEE 13:10] 


If a quadric trend is removed, we have 


N-3 
N-2-k 


м... [13:11] 


and so forth for the removal of trends of higher 
parabolic order. The values of є? given in the 
table were computed by [13:11]. The median є? 
value of the table is .012 which differs but 
slightly from .000, the value to be expected if 
there is no period in the data. 

The periodogram analysis here discussedmay be 
thoughtofas terminating the problem only in case 
no period is discovered. If the analysis does 
establish oneormore periods, their further spec- 
ification could be accomplished by fitting, say, 
a Fourier series, bya method whichis appropriate 
when the time span of the data is not an exact 
multiple of the period, or periods. 


section 2. LEAD AND LAG IN TIME SERIES 


The attempt is frequently made to predict the 
fluctuations of one time series from those of 
another. Let us consider two situations: Except 
for the aberrations which operate like chance 
(a) time series А and time series В fluctuate 
concomitantly, and (b) time series А tends to lag 
behind time series B by a constant time interval. 

In the (a) situation the correlation between 
the two series may be of interestin a historical 
study and it may even beofinterest in a predic- 
tion study if it is easy to secure the informa- 
tion for one of the series and difficult for the 
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other. Thus, if relatively easy to get steel 
prices neither lead nor lag behind hard to get 
heavy industry prices, it obviously may be of 
advantage to use the first to predict the second. 

Knowledge of the correlation maintaining in a 
(b) situation is generally even more useful for 
the fluctuations of the leading variable may be 
used to predict the future fluctuations of the 
lagging variable, for example, rainfall in June 
might be most useful in predicting harvest in 
August. The three chief statistical issues of 
this problem are (1) by what amount of time does 
the predictand variable lag behind the predictor, 
(2) what is the correlation between them, and (3) 
what are the standard errors, or the fiducial 
limits, in the time lag measure andin the corre- 
lation measure. The third issue has, as yet, no 
adequate solution. А sequential analysis based 
upon correlated data or the employment of sub- 
samples seems called for, but these are beyond 
the scope of this text.’ An analysis yielding a 
solution to the first two issues will be illus- 
trated in connection with the data of Table 
XIII B. 

Let X, be the predictand, X, the predictor, 
and X, the time variable. There is probably some 
trendin the X, series and the same or a different 
trend in the X, series. We can take out the 
trends involved by the method of the preceding 
Section and correlate the residual variables, 
pairing measures corresponding to the same moment 
of time and again pairing them with some time lag. 
This process might wholly or partially eliminate 
the measure of lag that concerns us. Suppose the 


trends аге Хул = f(X,) and Xim = f(X,+A) in 
in which the function of X, in the first equa- 
is identical with the function of X,+A in the 


second. Clearly А is exactly the measure of lag 
which we wish, but it has been removed in the 
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taking out of the trends and it will not again 
reveal itself by a study of the correlations, 
with various lags, between the residual measures 
X,.; and Х,.›. There is promise in the study of 
lag in an approach which requires that X,,, anc 
Хуло be identical functions,— the first of X, 
and the second of X,+A, if we first express the 
predictand and the predictor variables in compar- 
able units. Comparability of measuring units is 
generally involved in any study of lag. 

Àn obvious way to attempt to secure compara- 
bility is to deal with standard scores, as used 
in [13:13]. This approach may be expected to be 
most serviceable іп connection with variables for 
which we have no incubitable zero points. 

In the case of prices wehave such a point and 
equal relative changes in prices are in general 
equally important and meaningful. In this case 
the equivalence of ratios method, [13:15], is 
fruitful. A still better procedure in this case 
of an indubitable zero point is to let X, and X, 
be the logarithms of the gross measures. Though 
this logarithmic transformation is appropriate 
to the "all foods" and the "beverages and choco- 
late” price indexes of Table XIII B, we will 
follow the more common practice of taking these 
price ratios as they stand for the X, and X, 
variables, 

We pose the question,—are "all foods" retail 
prices a bellweather for "beverages and chocolate" 
retail prices and, if so, by what period of time 
do the latter lag and what is the correlation 
between them? 

Pairing X, and X, items as in Table XIII B is 
a pairing with zero lag and it provides a sample 
of 84 paired measures. Pairing February X, with 
January X,, March X, with February X,, etc. is 
the one month lag pairing and the number in the 
sample is 83. Pairing March Хо with January X,, 
etc. provides the 82 cases in the two month lag 


13:12] > 


LEAD AND LAG 


TABLE XII 


RETAIL PRICES, 

U.S. Department of Labor, Bureau of Labor Statistics, 
Serial no. R 384, Date, 

Average 1923-15 = 100. 


5| large 1.5. Cities: 


I 


B 
1929-35 


1938. 
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X, = "beverages and chocolate", and Х, "all foods, " 
1929 1931 1933 1935 

ЖЫ йе NEA Х, X, X; X, X% X 
Jan, 110.7 102.7 90.2 39:22 2-Й 62:0 7355: 7751 
Feb. 110.8 102.3 89.4 86.0 69.5 50.1 73-3 79.7 
Mar. 110.9 101.4 87.3 85.1 68.5 59.8 72.3 79.7 
Арг. 111.0 100.8 84.0 83.9 68.4 60.1 71.4 81.6 
Мау 110.8 102.4 82.4 82.6 67.7 62.5 70.8 81.4 
Jun. 110.5 103.7 82.0 80.5 67.3 64.9 70.4 81.7 
Jul. 110.6 105.5 “SIR 80.7% 67.4 71.01 59.9. 79:9 
Aug. 110.4 108.1 Slat? 30:9). “Cred 72.0 56993 27996 
Sep. 110.2 108.0 81.0 80.6 67.8 72.0 68.4 30.0 
Oct, 110.1 107.6 80.щ 79.9 68.4 71.2 68.0 80.2 
Ноу. 108.9 105.7 79.9 78.2 68.4 70.8 67.8 81.0 
Dec. 105.3 105.7 79.4 75.2 68.0 69.7 (67.5 32.2 

Х, Х, Х, Х, Х, Х, 

1930 1932 1934 

Jan. 101.3 104.6 78.4 72.8 68.7 70.5 

Feb. 99.1 103.4 77.4 70.5. 69.6 72.8 

Mar. 97.9 102.0 76.9 70.7 70.6 72.6 

Apr. 97.2- 103.3 — 76:001 70:3: filet n7 2:2. 

May 96.1 102.6 74.9 88.5 72.1 73.0 

Jun. 95.6 101.2 74.4 67.5 72.2 73.2 

Jul. 95.7 97.5 74.2 68.3 72.1 73.5 

Aug. 95.0 96.6 73.7 67.1 724 75.2 

Sep. 93.8 98.3 74.6 66.7 72.8 765.8 

Oct. 92.9 97.8 74.5 65.3 73.1 75.8 

Nov. 92.2 95.2 73.8 65.6 73.0 75.2 

дес. 91.9 92.1 72.8 64.7 73.3 74.6 


series, etc. The correlations resulting from 
these different pairings yield a series whose 
p can be found by fitting a parabola, see 
[7:23], and the location of this maximum, i.e., 
the desired period of lag, is given by [7:24]. 
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This process may be laborious so the following 
short-cuts are noted: Тре difference correla- 
tion formula, [10:52] shows that for V, and V, 
constant (and they are nearly constant in going 
from the sample of 84 to that of 83 to that of 
82, etc.) the correlation is directly dependent 
upon the variance of the (X,-X,) differences. 
Further, the easy to compute (see [6:68]) aver- 
age deviation of these differences may be sub- 
stituted for the variance of the differences to 
give an order of magnitude. 

Thus, first compute differences (X,-X,) for a 
succession of lags, then compute the average de- 
viation for each of these sets, stopping when a 
minimum is obviously passed. For each lag in 
the neighborhood of this minimum compute the 
three values, V,, V,, V(X,-X,), employed in the 
correlation formula [10:52], and compute the re- 
quisite correlations. 

Performing these steps upon the data of Table 
XIII B yields the information given in Table 
АТС. 


TABLE XIII C 


CONSTANTS FOUND FOR VARIOUS LAGS IM 
"BEVERAGES AND CHOCOLATE" 


83 


N 84 82 8l 80 
4.D.of(X.-X.) 5.006 4.759 4.752 4.805 4.986 


Ve 197.250 199.534 202.031 204.491 206.950 
Uie 211.259 203.619 195.404 186.962 177.950 
ИХ. -Х,) 36.897 35.101 33.571 34.588 36.017 
г „9103 .9131 „9156 „9125 „9092 


The quadric best fitting the first four points 
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(0 lag, .9103; 1 mo. lag, .9131; 2mo. lag, .9156; 
3 mo. lag, .9125) is given by [7:23]. It has a 
mode, as given by [7:21] and [7:24], at 1.808 
months, revealing a lag in the retail prices of 
"beverages and chocolate" upon the retail prices 
of "all foods" of 54 days. The correlation with 
this lag, as given by [7:25], is .915. This 
completes the computation. 

Serviceable standard errors could be gotten 
by repeating the previous process a number of 
times, using the data for other years, and ob- 
taining a distribution of lag measures and also 
а distribution of correlation coefficients. 


SECTION 3. IMPOSED CONDITIONS 


Limits. А phenomenon common to the physical, 
biological and social sciences is that some 
magnitude approaches a limit as time, or age, 
increases. The typical growth curves of biology 
have an adult limiting value which they approach 
asymptotically as age increases. А chemical re- 
action may approach a state of equilibrium in a 
very similar manner. А bond price may gradually 
approach parity, or some less readily defined 
value determined by general economic conditions, 
as time passes. Not uncommonly the forgetting 
and the learning curves that the psychologist 
encounters are of the same type. In all these 
cases the anticipated statistical problem is 
first, to determine the nature of the complete 
grovth curve from a study of complete or nearly 
complete antecedent phenomena, second, to fit a 
curve of the established type to data for which 
the time sequence is incomplete and third, by 
extrapolation secure an estimate of the limiting 
value. 

For example, let us say that a study of the 
height growth curves of a substantial number of 
human males who have been -followed from infancy 
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to adulthood reveals that the phenomena of in- 
crease in height follow a certain pattern, or 
type, and second that we have a record of growth 
in height of a particular boy. Then, third, we 
fit the type curve to the individual record and 
find the maximum, or adult value of the fitted 
curve. This we take as the estimate of the pro- 
spective adult height of the individual. There 
are a number of more or less difficult statisti- 
cal problems involved in eachofthe three steps; 
(1) the determination of the type curve; (2) the 
fitting of it to the individual record; and (3) 
not the determination of the adult value which 
is very simple but the determination of its 
standard error (or better of its entire distri- 
bution form) which is the essential measure of 
credibility to be attached to the limiting, or 
adult, value just determined. 

The algebraic solution of these problems is 
beyond the scope of this elementary work, but 
the use of graphic devices will serve as a good 
introduction to the sundry issues and hazards 
which are present. 

Subordination. In medical practice certain 
syndromes have been found to be characteristic 
of certain diseases. Thus if a doctor suspects 
disease a he looks for symptoms 4, B, C, and D, 
whereas if he suspects disease £ he looks for 
symptoms B, E, F, and С. If the symptoms are 
qualitative, that is definitely either present 
or absent, very adequate contingency methods em- 
ploying punched or notched cards with machine or 
manual sorting may be employed. These have 
proven very serviceable in limiting and refining 
diagnoses. When the symptoms are quantitative, 
as for example would be hours of sleep, metabolic 
rate, evidence of lassitude, and especially when 
the symptoms are in part qualitative and in part 
quantitative, the determination of an adequate 
procedure is much more involved. The problem is 


— ? “ 
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common to many fields. For example the adapta- 
bility of crop to terrain is a complex function 
both of quantitative phenomena like rainfall and 
soil ingredients and of qualitative phenomena 
like presence or absence of a freezing tempera- 
ture, a cultivation practice, etc. Though the 
full handling of this question is beyond the 
scope of this text the approach to it is through 
the statistics of contingency, multiple correla- 
tion and factor analysis. 
SECTION 4. COMPARABLE MEASURES * 

For two differently derived quantitative mea- 
sures to be comparable it is logically necessary 
that, except for a chance factor, they measure 
the same underlying phenomena. It frequently 
happens in connection with social phenomena that 
a first score is in large part a measure of the 
same thingasa second differently derived score, 
as, e.g., are two intelligence test scores. Both 
tests are designated by their authors "intelli- 
gence" tests and both probably measure in large 
part the same underlying function, but in lesser 
part each measures a capacity found in the one 
but not in the other. Equating the scores of 
two such tests violates the logical requirements 
of equatable scores, but nevertheless it has 
been done frequently and with an outcome that 
has seemed serviceable. The matter of how simi- 
lar two functions should be before they are 
equatedis a problem that we leave to economi sts, 
sociologists and psychologists concerned with 
particular issues. We here assume that except 
for chance factors in each of X, and X, the 
underlying function measures are identical. 

The equivalence of successive percentile meth- 
od: If scores x, show a monotonic increase with 
increase in the underlying function and similarly 
for scores X, and if the chance factors in each 


A more detailed discussion with illustrations is given in 
Truman L. Kelley, STATISTICAL METHOD (1923), Chapter VI. 


500 SUNDRY ISSUES [13:12] 


are negligible, then clearly the same percentile 
scores on X, and X, are equivalent. А table can 
be drawn up giving in neighboring columns the 
successive percentiles, Роу, P. 9 5, -- P 99 for X, 
and for X, and these pairs are equivalent scores, 
Or, in general 


25, ху 2, Equi valent percen- (13:12] 
tiles 
This method does not require that the form of 
distribution ofthe X, scores be the same аз that 
of the X, scores. In general the method is not 
serviceable when X, and X, have quite different 
reliabilities; as, e.g., would be the case if X, 
is an intelligence score on a 60 minute test anc 
X, a score on a 10 minute test. 

The standard score method: This method de- 
rives from the practices of Francis Galton (who, 
however, used medians and quartile deviations) 
and it asserts that standard scores as defined 
in [8:23] of equal magnitude are equivalent. In 
addition to the requirement that the underlying 
functions are the same, this procedure requires 
that M, <= M,; that c == 05; that the form of 
distribution of X, is the same as that of X,; 
and that the relative chance errors in X, and X, 


are equal, namely that о, Иа = %/ %- Thus 
X, < X, when 1 2 
Ху! X,-M E 
EN AUT (13:13 
ss 95 


The estimated true standard sccre method: 
When the relative chance errors in X, and X, are 
unequal an improved standard score method basec 
upon estimates of population statistics is to be 
preferred. In this instance it is assumed that 
the underlying functions are, except for chance, 
the same; anc, if these underlying true measures 


are designated X, and X.; that M, <> M,; that 
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o == g,; and that the forms of distribution of 
X, and of X, are the same. Then X, <> X, when 
маргад 


w 


9, Or 


The sample estimates of the true statistics are 
M а Каи O E E ТИ) 


e 
and similarly for the second variable, so that 


X, <> X, when 
(X, 7M, Угу (X,-M,)VT; Equivalent 
ыз» оту ту > estimated true [13:14] 


standard scores 
Ti Оо, 


The ratio method: А common procedure in eco- 
nomics is to consider certain price indexes or 
ratios to be equivalent. Economic considerations 
generally justify the assumption of a common 
zero point, or that zero price for one commodity 
is equivalent to zero price for a second. If 
there is justification for equating another 
point, such as M, <= M,, or D, 2 В,, in which 
В, and В, are some justified par, maturity val- 
ues, or the like, and if the distribution of X, 
is of the same form as that of X,, then X, OX, 
when 


Х Х, 
— = Equivalent ratios [13:15] 
By В, 


1 


SECTION 5. QUOTIENTS 


There are numberless situations where some 
quotient X/Y is given as the crucial terminal 
“statistic. One of the sounder trends in the 
study of variability and of distribution is the 
employment of the variance ratio. In education 
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and psychology, intelligence quotients, accom- 
plishment quotients, etc. have been widely used. 
These procedures have very different merit be- 
cause the quotients have very different proper- 
ties. The variance ratio is a very special quo- 
tient in that the presumptive form of distribu- 
tion of the numerator is known, as is that of 
the denominator; the two variances are uncorre- 
lated; and the form of distribution of the quo- 
tient is known and available in tables. The 
general quotient, X/Y, is far less tractable in 
its algebraic handling than the variance ratio. 
The variance error of X/Y is generally unknown 
and is always difficult to compute, the diffi- 
culty in some cases being insurmountable. Of 
course the determination of the complete distri- 
bution of X/Y is still more difficult. 

The accompli shment-quoti ent, which has been 
defined as an achievement-age divided by a men- 
tal age, has been used by some psychologists and 
educators for years wi thout attaching a standard 
error, or other measure of variability, to it. 
With such limitation the quotient is not a cred- 
itable statistic. Investigators should endeavor 


quotients having unknown Standard errors. 
The urge to employ a quotient resides in a 
number of facts such as that the quotient con- 


errors, such as the abstract numbers of mathema- 
tics, is well understood by everyone with ele- 


ligence quotient in a way and to a degree not 
immediately obvious in theomenful sm A те 
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upon observational data involves the first step. 
of understanding the meaning of the numerator 
and of the denominator separately and the meaning 
of what is meant by division, and a second step 
of appreciating the trustworthiness to be at- 
tachėd to numerator, to denominator, and finally 
to the quotient. For practically all quotients 
the first step is simple, but for only a few is 
this true of the second step. 

Consider two quotients (a) an accompli shment 
quotient which equals an achievement age divided 
by a mental ageand (b) a variance ratio VAT 
in which V, , is the variance of that part of 
achievement age which is independent of mental 
age and Үүд, is the variance of that part of 
achievement age which is dependent upon mental 
age. These two quotients bear upon the same 
fundamental issue and answer the same fundamental 
question. For the first quotient the first step 
in understanding is simple, but the second is so 
involved that it is usually entirely neglected. 
For the second, the first step in understanding 
is somewhat more difficult, but the second quite 
simple in that one can avail himself of the 
known form of distribution of the variance ratio. 

Should fundamental findings be reached by 
variance ratios, but reported if necessary on 
account of the level of understanding of the au- 
dience by such quotients as accomplishment quo- 
tients the net outcome might be substantially 
accurate and informative. Failing this proce- 
dure, the reader of an article reporting quotients 
must have the idea that the writer is as little 
aware of the error as is the reader. 

The standard error of a quotient is infre- 
quently reported in the experimental literature, 
though a fairly simple formula giving it is 
imation (i.e., an 


E un LE АТГ REN 
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the variance error of the ratio X/Y is 


X yeu сс. y 


x ху ху y, Variance er- 
VT м м йз] 
ү? X? X Y y? quotient 


There are certain shortcomings in the use of 
this formula. X and Y must be positive. The 
formula only applies when the quantity су/Ү is 
sufficiently small that neglect of powers of 
this quantity beyond the first will constitute 
no material inaccuracy. The distribution entire 
of X/Y is generally non-normal so that the in- 
terpretation of the critical ratio of the quo- 
tient divided by its standard error by means of 
the probabilities given in a normal distribution 
will not be precise. Finally the correlation 
between X and Y is required and there may be 
difficulties in obtaining this. With all its 
limitations a far wider use of this formula to 
give the standard error of a quotient would be 
a great improvement over the common practice of 
entirely neglecting the matter. 
The quotient 
гіз Vine 
Qu POE a rg 
213 Vi дз 
may be called a quotient of determination coef- 
ficients. Itisindicative of the relative value 
of x, and х, as predictors of x,. Accordingly, 
a significant excess of this ratio over 1.00 in- 
dicates that x, is the better predictor. The 
variance of Q as derived by Arthur C. Hoffman 
(in an unpublished paper) is 
40? 1 1 2, 2гзг, 257,5 
Е [13:16а] 
ин эг 


If N is, say, greater than 25, it would seem safe 

to interpret the critical ratio т in terms of 
STIR Ec 2 

the probabilities of the normal distribution. 
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SECTION 6. THE MOST RELIABLE WEIGHTED AVERAGE 
OF INDEPENDENT MEASURES 
If the measures to be averaged, Х,, Х,, Х,, 
., have known variance errors, we may write each 
of them as equal to a true measure plus anerror, 
thus: 


Х, =X, Жем: X, =X, +e; Х, =X; +е,; etc. 


If these measures are averaged, weighting them w), 
Wo, Wz, etc., we have: 


w X №, X, tw Xt. таслал 


үеийн нв Эл дайн Ч гал алсаас [13:17] 
w, tw, tw +... м * Ww, tW, +... 


The variance error of М is equal to the variance 
of the second term of the right-hand member of 
[13:17]. The various e's are uncorrelated as 
they are error functions. Knowing the variances 
of these several e's, which in harmony with 
earlier notation we designate V, ,, Va, 21 и... 
etc., we desire зо to determine the w's that the 
variance error of М is minimal. We will first 
solve this problem for the case of two variables 
and will let 


р=——; q-l-p- 


E 
и 
© 
>< 
Е 
16 
a 
>< 
N 


m 1.« 2.у 
2 
V 1.0 2.у 
à 1 y EV 
Viat uy 1.0 2.7 


The [] term is a perfect square and is the only 
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term containing р, so the entire function takes 
its smallest value when the [] = 0. Solving, 


1 
pru — .. їїзай 
Ч “її. 1 

ye 


The optimal weights are thus the inverse of the 
variance errors. 

If we have k-variables, К-1 of which are com- 
bined (using either optimal weights, or non-op- 
timal weights) and treated as a single variable, 
we can combine this with the k'th variable in 
the optimal manner, and the relative weight of 
the k'th variable, by [13:18], will be as the 
inverse of its variance error. We accordingly 
have for the optimal average of any number of 
variables: 


1 1 
——X,* X, jouer Hon 
и Vy a У 2 5 [13:19] 
Е ИО Бар : 
VT У US у +... 


Most reliable weighted average 


The most reliable weighted averaée of measures 
whose errors are independent is obtained by 
making the weights equal to, or proportional to, 
the inverse of their variance errors. 


SECTION 7. FITTING OF CURVES TO OBSERVATIONS 


Two broad categories are readily distinguish- 
able. They are the fitting curves to frequency 
distributions and the fittingof regression lines 
In the first case goodness-of-fit is attested by 
the degree of agreement between observed frequen- 
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cies in the classes employed and the theoretical 
frequencies, or frequencies in the case of the 
fitted curve, in the classes. In the second 
case goodness-of-fit is attested by the smallness 
of the quantitative deviations from the regres- 
sion line of the observed values. 

We shall here consider the fitting of fre- 
quency distributions by the Pearson method. An 
alternative procedure developed by Charlier (See 
Rietz, op. cit., ch. VII) will not be expounded, 
but it has special advantages when it is desira- 
ble to express a distribution as a cumulative 
function of normal distributions. 

The Pearson system (Pearson 1894, 1895, 1901, 
1902 Cont., 1914; Elderton 1927; Craig 1936; Rietz 
1924, Ch. VII by Н. C. Carver) of unimodal, in- 
cluding anti-modal, frequency distributions is 
very comprehensive in that it covers the broad 
range of types shown on Charts XIII III and 
XIII IV. It employs two, three, or four рага- 
meters only, in the definition of its curves, 
and these are determined by the method of mo- 
ments. This method establishes the equality of 
the moments of the fitted curve to those of the 
data up to the number needed, e.g., four,-mean, 
variance, 44, and шц, — in the case of a curve 
with four constants. 

Further, since these moments are linear func- 
tions of the frequencies in the classes, the ex- 
act number of degrees of freedom represented by 
the variance of the differences of the observed 
frequencies in the classes and the theoretical 
frequencies, or those given by the corresponding 
areas under the fitted curve, is (k-par),—the 
number of classes employed minus the number of 
parameters. This enables a precise A? Тог а 
test of goodness-of-fit. 

The type of curve is a function of the mo- 
ments. The first moment, in all cases, merely 
locates the curve along the x-axis. The second 
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moment, in all cases, merely measures уаг1аЬ111- 
ty, or, we can say, merely reflects the units in 
which x has been measured. Thus the important 
differences in type depend upon the third and 
fourth moments. Accordingly, all the different 
Pearson types can be represented Бу regions in a 
plane with dimensions p,/o? (= VPearson's БЕ = 
Carver's a,) andu,/c" (= Pearson's 8, = Carver’ s 
a,). Rhind (in a chart used by Pearson) has 
charted all regions, using as axes B, and £, 
17:12] and [7:03]. We herewith give a chart of 
this sort, Chart XIII III. 

, Charts VII I, VII II, and VII III, in Chapter 
VII, use Pearson notation and illustrate the 
range of curves covered by different Pearson 
types. 

Two criteria used by Pearson are ку and к, 
defined аз follows in terms of By, Bor and Craig's 
8: 

38(8,*4) 


2558 


к 


= 28, -38,-6- 


= 


еси. [1320] 


(8,43)? A 
——_ —ы=— . 
“2 "445,38, (25,-38,6) ча) (13:20 


: Craig gives a chart of similar purpose having 
dimensions B, and 8. 


2 28,-38,- 6 
= B,*3 Cralg's ò [13: 22] 


This 8 is the sane as Pearson's a (See 1914, Part 
I, p. lxi). The use of Craig's 8, i.e., Pearson's 
a, leads to simplifications in equations and in 
charts. We also reproduce, with permission 


Craig’s Chart,-Chart XIII IV herewith, 


[13: 23] > FITTING CURVES 
CHART XIII III 


Diagram of Types of Frequency Distribution 


IN 


= 
Nyl 
5 
E 
[Б] 
I 
Un 
|“. < 
ша 
Е 
|] 
mE 
LN 


| NL I 
МЕРАХ 
Е 0 О ААА 
FASS 


509 


510 SUNDRY ISSUES [13: 23] 


CHART XIII IV 


THE (a°, 8) CHART FOR THE PEARSON SYSTEM OF 
FREQUENCY CURVES 
(The subscript L refers to bell-shaped curves) 
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On Chart XIII III are lines indicating where 
certain moments becomeinfinite. This, of course, 
refers to the moments of the fitted curve orly 
for any positive moment obtained from finite 
data cannot be infinite. The lines indicating 
infinite negative moments 

SN 


Bom Ла X а deviation from a boundary [13: 23] 
N of the curve) 


also refer to the fitted curve, though an infin- 
ite negative moment is a possibility with finite 
data. The bearing of infinite positive and neg- 
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ative moments upon the stability of distributions 
has been discussed by the writer in Statistical 
Method, (1923). Itis especially noteworthy that 
Type III is the only general type for which all 
positive moments are finite, and that Type V is 
the only one for which all negative moments are 
finite, and that the limiting distribution, rep- 
resented by the intersection oí the Type III and 
Type V lines, namely the normal distribution, is 
the only one having the stability that attaches 
to finite positive and negative moments of all 
orders. 

The location on Craig's diagram of certain 
types and also certain properties of these types 
are given in Tables XIII D and XIII E, herewith, 


TABLE XIII D 
PEARSON TYPES INVOLVING THREE PARAMETERS 


р шини 77 
ТҮРЕ 
Two category 5==1 Lower limit in 9 of possible distribu- 
tions. U-shaped curves 


XII 5--.5 J-shaped 
11 850 Upper limit тб for which all positive 
: moments < ©, /-shaped for 2; < 4. 
J-shaped for f, > Ч. 
5=.4 Upper limit in $ for which j4 < 9. 
5=| Upper limit in 5 for which 4, < o, 
$=2 Upper limit in 5 for which ш, < 9. 
|| and У! A70 А1 symmetrical distributions 
И -|«8«-.5 U-shaped; -.5<5<0 /-shaped 
Vil 0<5<2 /-shaped. 
VILL, IX, and XI В, (2:38) (2848) (25). Sr en [1:32:24] 
Vill —1<8<—.5; transition between U- and J-shaped 
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TABLE XIII D 


(CONTINUED) 
Type 
IX —.5<8<0; transition between J- and ^-shapei 
XI 0<5<2 ; transition between J- and ^-shapei 
Y Casi, op КО Е (18:25 ] 
В 
TABLE XIII E 
TYPES INVOLVING TWO PARAMETERS 
TYPE 


M 850, 68-1, Mendelian distribution (equal 
frequencies in two discrete 
classes). 

R 850, 8=-.5. Rectangular distribution. 

N B,=0, 8 70. Normal distribution. 

ХогЁ 85h, $ = 0, Exponential distribution. 
L B,7.32, 6=-.4. Straight line distribution. 
Р В,=0, 5=-1/3. Parabolic distribution. 


Let the general equation covering all the 
Pearson curve types be 


VE cure. [13:26] 


for which the differential equation 1s 


dy ах 


——= ———— > . 


y dx by +b,xtb,x? 


i 8 5 -Shg [13:271 


Written in terms of a, and 5 (Craig’s) and valid 
except when 5 = - .5 this equation is 
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ae 
dy ___ 2(1428) 
y ах 2*8 а, 


2(1+28) 2(1+28)^ 2(1428). 


[13: 27а] 


2 


This covers an extensive class of unimodal, or 
^-shaped, of J-shaped, and of antimodal, or U- 
shaped curves. Some of these are unlimited ín 
both directions, some limited in one direction, 
and some limited in both directions, depending 
upon the specific values of the parameters а, Б, 
b,, b,. The unimodal requirement limits the nu- 
merator of the right hand member of the di ffer- 
ential equation to x to the first power only. 
The limitation of the denominator to x to the 
second power is arbitrary, but sufficiently gen- 
eral to encompass a very wide variety of distri- 
butions. 

The curve may be fitted by the method of mo- 
ments, This assures that the moments of the 
curve up to the requisite number agree with the 
moments of the data. Їп general it requires the 
first four moments, M, V, иод» № to determine 
the fourparameters а, by, b,, b,, Having these, 
the integration of [13:27] yields the equation 
of distribution [13:26]. 

It simplifies matters to deal with standard 
scores, so following Carver (See Rietz, 1924, 
Ch. VII) and Craig (1936) we let 


E(X-M E mom 1 т f 
eE et an LU 
N c^ 


The following recursion formula for moments 


holds: 


a atna Буа, Б+(п+2)а, „Балт [13:29] 


а 


Setting n successively equal to 0, 1, 2, 3 leads 
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to four condition equations from which we derive 
the following, valid except when $ = ~ ‚5: 


2(1+25)а = о. [13:30] 


2(1+25)Ь, = 248 . mm [13:31] 
222 22:70:11... 03:32] 
2(1328)5, = 8 .. 22505 03:33] 


For the general case we note that 5 depends 
upon шц, H3, V, and M, and that a, depends upon 
из» V, and М, so that the four parameters a, b,, 
bi, b, aredetermined from the first four moments. 
One further parameter Уу isdependent upon N, the 
number of cases in the sample, or is arbitrary. 
The general types are called four-parameter dis- 
tributions. In all cases а simple way to deter- 
mine У is to choose ап arbitrary value, У, 
compute the frequencies in all classes with a 
grouping as fine as the precision of the data 
justifies, sum these frequencies and multiply 
this sum by such a number À that the product 
equals №, or the arbitrarily chosen value, and 
then determine yo from [13:34]. 


Yo SUN. Уд The ordinate at the origin [13:34] 


To determine the Pearson type curve that fits 
given data one can locate on Chart XIII III, or 
Chart XIII IV the point given by his data and 
note the type region, or, if near a line, the 
type line in which or near which the point lies. 
What constitutes "near atype line" is considered 
in probability terms in Pearson’ s Tables, Part I. 
Great simplification occurs whenever three rather 
than four parameters can be used, or when two 
rather than three or four suffice. 
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Fitting the most important two-parameter curves. 
In addition to the normal distribution, the equa- 
tion and fittingof which have already been given, 
these are M, R, P, L, and Е. 

The Mendelian distribution: The equation of 
this takes the non-useful form У. 
The origin is at the mean, the limits are -о, 
and c, and y, ~ 0. However one does not need 
the equation to investigate any problems of 
goodness-of-fit and the like which may arise. 

The rectangular distribution: When N-1; M=0; 
range from -043 to c УЗ, the equation is 


H 
2e V3. 
The parabolic distribution: When N-1; M-0; 
range from -o V5 to o V5, the equation is 


Unit rectangular distribution [13:35] 


у-луу б) ЧИ" [13:36] 

The straight line distribution: When №=1; 
M=2 У2 о; origin at the left end at which point 
y 70; range from 0 to 3 У2 o, the equation is 


X 
y =— Unit straight line distribution [13:37] 


The exponential distribution: When N=1; M-o; 
origin at the left end, at which point у=е; 
range from 0 to 9, the equation is 

-x 


MESS em Unit exponential distribution [13:38] 


The fitting of the three-parameter distribu- 
tions to data is not difficult. 

The two-cateáory distribution:  8--l. As 
with the Mendelian distribution the equation of 
the curve involves the indeterminate feature 
Q0 X o, All distributions in which the frequen- 
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cies fall into two discrete categories are here 
represented. Letting the scores attaching to 
these categories be 0 and 1, and letting the 
proportions in them by g and p, then all the mo- 
ments are functions of p, as given in Table IX A, 
and the distribution is completely defined. An 
a priori value of p is necessary if any test of 
goodness-of-fit is involved. 

Type XII: ,92-.5, When N-l; M=; а =рз/о?; 
range from то(У 388ү-а.) to c (388, ta, ) the equa- 
tion is 
RUM 

Е о(/348,3а,) - x шаг 
УЗУд1---ы-ы-3--- Туре ХШ(13:39 | 
ol 3+B,-a;) +х 


ЇГ у, is determined so that the total area = ДА 


1 
yo M G 


3 95 ‹ 
20-/348, ГЕ DID Am) [13:40] 


а 
ИЗЕЙ: 
Ordinate at the polnt X =o a, 


To five decimal places the gamma function values 
may be gotten from Table XIV A. 

As a sample we give herewith the Type XII 
equation corresponding to the (25525281) 
point, when ш, is positive, M=0, с-1, N-l, and 
the range is from -1 to 3. 


1 3-х 1 Unit Type XII distribution 
у = (—9* when Ву=1 and Hy 1$ posi- 
2т lix tive 


This is a twisted J-shaped curve, i.e., one in 
which the low end of the J turns down a little. 

Туре III: 8=0. When N=1; origin at the mode; 
range from -a to ©, the equation is 


D 
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Х -2Х 

y =y, (1+—)^7?ехр Type TII [13:41] 
a (397 


in which X is a deviation from the mode, and 


Mo cuu c Ere ceo uc arias ЕЯ 


Eye SEI EE мии 277 И 
ед 
222 
а =—- Ne Sawa: Cerne A ТЗ ИА 
а» 2c 
4 
A E л КО И 22082 22227 113:45) 
Bi 
jal (4-1) (4-11373 Ordinate at 
MER m the mode [13: 46] 


in which the reciprocal of the [] term is given 
by [14:50]. When 8; =0 this reduces to the nor- 
mal distribution and when 8, = 4 it becomes the 
exponential. Herewith is given as an example the 
Type III equation when и, is positive, 8, = 1, 
c = 1, М=1, Mo = 0, the range then being from 
-1,5 to o, 


1 Unit Туре ТТТ 
т е-?*-3 (3+2Х)3 having Д = 0 (13:41) 


The distribution of variances from samples drawn 
from a normal population is of Type III. This 
type has been found to be widely descriptive of 
phenomena. Areas, ordinates and derivatives for 
Type III distributions have been tabled by Sal- 


_vosa (1930). 
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уре р ое 0 (зле ТВ, < 3). 
When М = 1; origin at the mean; range from -a to 
a, the equation is 


2 
Ух (1-2р" Туре ГЇ distribution [13: 48] 
а 
M = Мо, ог anti-mode = #ап = 0 [13:49] 
Вр ce) 
cm NUI SEI... [13:50] 
"23-8, 
2 VB 
he EET e. [13:51] 
M OE [13:51] 
У, ы СЙ Ordinate at the mean [13: 52] 
а vm Г(т + 1) 


When 5 = -l (i.e., 6, = 1), a Mendelian dis- 
tribution of equal frequencies in two classes 
results,—see Chart VII I, Type M. 

When 8 = -.5 (1.е., P, < 1.8), m is negative 
and a U-shaped curve results,—see Chart VII 19 
Type II-U. 

When 95191320: (тег, 1.8 < В, « 3), mis 
Positive and a platykurtic A-shaped curve re- 
Sults,— Chart VII I, Type II-i. 

‚ When 8 = -1/3 (i.e., В, =2%), the distribution 
is parabolic, —see Chart VII I, Type P. 

When 8 = 0 (3,e., А, = 3), а normal distribu- 
tion results. 

Type VII. 8, =0; 0« 8 < 2 (i.e., 3-8, < o). 
When N = 1; origin at the mean; range from -œ to 
о, the equation is x? 
ye HS eda = піз. 53] 
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о в seen on АЗБА) 
НЫЕ 
Mecraem у ш [13:55] 
2V 
2 2 
qe E Б D 
Гт 
У ее ОАО [13:57] 


а уп Г(т-1.5) 


For illustration, see Chart VII I, curve A. 


Types VIII, IX, апа XI. f, = (2+38) = (2*43)* 
(248) as given in [13:24]. In the equations for 
these three types occurs an exponent, т, which 
is found by solving the accompanying cubic, 


т3(8,-4) *m*(98,-12) «248, +168, = 0 [13:58] 


The solution may follow the steps indicated in 
Chapter XIV, Section 7. А plot of [13:58] shows 
that, in addition to an extraneous branch, there 
1 branch passing through the following points: 
0, m = =: Йу = 0; т=0; £,74,m- toy 
А, = а, m= -4. The region between the first 
two points yields Type VIII distributions and 
herein -1 < m < 0. Between the second and third 
point yields Type IX distributions and herein 
0 < m< c. Between the third and fourth point 
yields Type XI distributions and herein ~% < m < 
-4. The distributions represented by special 
points upon this cubic are В, L, and E, as noted 
in Table XIII E. А convenient form of the equa- 
tion of the curve for Types VIII and IX is 


н 
a 
m 


"Wow 


X 
y = y, (1 oem Types IIT and XX [13:59] 
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and for Type XI, 


УЕ и тува о. [13:60] 


The separate and specific featuresof these three 
types will not be considered. 

Type VIII. -l <m < 0; 0 < Д, < о; and also 
-1 < 8 <-.5. A limited range transition type 
between U- and J-shaped curves. The equation as 
written has N - l, and the origin at the right 
boundary when из ? 0 and at the left boundary 
when из < 0. When из > 0 the range is from -а to 
0 and when р; < 0 it is from 0 to -a (~-a being a 
positive magnitude). 


6 З+т ` The sign of ais the 
а = + c(2*m) PS same as that of 1, [13:61] 


= 1 +т Sign so chosen that У >о. [13:62] 


фа Ordinate at boundary. 


Yo 


REOR 
2 tm 


ese [13:63] 


Type ІХ. 0«m«o; 0 < В, < 4; апа а]зо 
=5 < 8 < 0. A limited range transition type be- 
tween J- and A-shaped curves, As written the 
origin isatthe left boundary when м. > 0 and at 
the right boundary when из < 0. The range is 


from 0 to -a when 44 > 0 and from ~a to 0 when 
из < 0. 


2 [34m The Sign of ais 
а co o(2+m) 1+т OppoSite that of by (13: 61a] 
Yo 18 given by [13:62] 


M is given by [13:63] 
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Type ХІ. -o < m< -4; 4 < B4 < о; and also 
0 < < 2. The curve is unlimited at one end and 
is a transition type between U- and A-shaped 
curves. When и, »0 (and the scale of measurement 
can always be so chosen that this is true) the 
equation as written has № = 1, and the origin at 
Q0, which is the distance a to the left of the 
left boundary of the curve. The range is from a 
to o. 


а = -o (24m) |2 (When p > 0) . . . [13:64] 
1+т 


y, = -(1m)a7 2). 00014806 бб] 
= BR). Mean measured from the origin [13:66] 
2+т 


The mean is -a/(2*m) to the right of the left 
boundary a. 

Туре V. В, = 48(2+8). А leptokurtic curve of 
limited range in one direction. When N = 1; 
origin at the left boundary when и, > 0; range 
from 0 to ©; the equation is 


yr ye № ё e c omar pesos” [13: 61] 


вне 
р 43 QU хо: ЕО 
1 


И 


Sign of у to be the same 


y = (p-2)e Vp-3 | аз that of m [13:69] 

у. ур-1 Пааа реа ап panas 13:70] 
Г(р-1) 

p = ee ees neret Ads 


р-2 
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2y 
И Te» 
As a sample we write down the Type V equation 
when N=1; о=1; 2471.5; origin = 0. dues р=16; 
уг14/13: у;=(14/13)19/ Г 15; #=/13; M=- ТЗ, 

and : 

_ (14/13) БЫ 

57165 


Мо = [13:72] 


Хас ехр 


The four parameter Types I, ТУ, and VI. For 
the fitting of these we refer to original mem- 
oirs, or to Elderton, op. cit. 

Type I is given by points in the Craig Chart 
for which 5<0. The general equation, withorigin 
at the mode, is 


X. Х 
у = у, (1+—) 1 (1—) ? Type I [13:73] 
ay а, 


Туре VI is given by points in the region Бе- 
low the line 3-0 and above the line /,748(2*8). 
The general equation, with origin at a before 
the start of the curve, is 


= п 522 
у = у,(Х-а) Х Type NT [13:74] 
Туре IV is given by points in the region be- 
low the line 5,-48(2*8) and to the right of the 
line £,-0. This is the only region in which the 
range js unlimited in both directions. The 


genera! equation, with origin at v a/r beyond 
the mean is 


X? ~ iX 
y yo (1+—) exp [-vtan (—)] [13:15] 
a 
а Туре Iv 
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SECTION 8. THE VARIANCE ERROR OF A STATISTIC DERIVED FROM 
ONE HAVING A KNOWN VARIANCE ERROR 


Let K be the statistic having the known vari- 
ance error V, and let D be the statistic derived 
from K, whose variance error, V,, is desired. 
In general c,/K is a magnitude having a factor 
1/ИМ, l/VN-l, or the like, so that, if М is at 
all appreciable the square and higher powers of 
c,/K are negligible in comparison with c,/K. In 
this case, but only in this case, the methods 
here given suffice. 

The binomial expansion method: Students with 
no knowledge of the calculus can use this method 
and it has a surprisingly wide field of applica- 
bility. sing the bar to indicate mean values 
when many samples are taken, and letting d be 
the error in the sample value of D and Kk the er- 
ror in the sample value of K, we have D-D*d, and 
K-K*k, so that 


Did = f(K + k) 


Expressing f(K*k) аз a binomial with positive, 
negative, integral, or fractional exponents, ex- 
panding and keeping the terms of zero and first 
degree only in k, as may be done if higher de- 
gree terms than the first in (k/K) arenegligible 
in comparison with the first, we obtain 


р+а4а=фК) + КЖК) 


the variables when many samples are taken are d 
and k, so we have 


The variance of a derived sta- 
d tistic as a function of the 
у. = ECK)’ variance of the given statis- [13:76] 
а tic (binomial expansion method) 
To illustrate this method let us derive the 
variance of the standard deviation knowing the 


variance of the variance. 
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DH EE 


c 225 v 
с + s=(Vtv)* =V? + = 


23 
2V7 
Computing the variance of the right and left 
hand members we have 


um V, 
а 
с dy Чу 


Variance error of С in 


case № is ample [13:71] 


Substitution of statistical deviations for 
calculus differentials method. The degree of 
accuracy in this procedure is the same as in 
that just given. Using the same notation as be- 


fore it is assumed that the calculus ratio — is 


essentially equal to the statistical ratio d/k. 
Given D-f(K) we obtain the derivative 


ар = F'(K) dK 
and substitute for the differentials, writing 
d= f'(K) к, 


thence immediately 
The variance of a derived 


statistic (statistical de- 


V, = [P (K)]? X, viations for di fferentials[ 13.78] 


method) 


To illustrate in the case of Y 
с = V2 
the differential equation is 


dv 


и 


substituting statistical deviations, noting that 
o =o +s, and that V- У+у 
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= v 
2% 


so that, computing the variance of both members, 


5 


V, 
yx Е 


Expanding by а Taylor (or Maclaurin) series 
method. The degree of accuracy depends upon the 
number of terms of the series kept. When kept 
to the f' term the accuracy is the same as in 
the preceding methods. Using the same notation 
as before, 


D=D+d= К+ К) 


2 
= КК)+ К f' б) MES 


Keeping the expansion to two terms and taking 
the variance of both members, 


V, = [ECR]? V, tives statistic (тар [13:80] 
lor's series method) 


To illustrate in the case of Vj: 


c=ots= (үн 
In 1 
= Ур 
and taking the variance of both members 
Е ао 
РЕЧИ AV 


Logarithmic differentials method. This is a 
special case of the substitution of statistical 
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deviations for differentials method, which is 
very convenient when the derived statistic is a 
function of products or quotients of other sta- 
tistics. Let 


Rie E cM E. .[13:82] 


wherein the variances and covariances of the 
variables К, L, and M are known. Taking the 
logarithm of [13:82] we have 


log D = а log K + b log + c log И [13:83] 


Taking logarithmic differentials 


dD ак dL ан 
— = a— + b—*c 
р К L M 


Substituting statistical deviations for differ- 
entials, squaring, summing, and dividing by the 
number of samples to get variances, yields 


RUE = Ag м 
о ба 

E ingen 
Habe + ace 2bc a method) 


An illustration of this method is given in Sec- 
tion 9. 


SECTION 9. THE VARIANCE ERROR OF A COEFFICIENT OF 
CORRELATION CORRECTED FOR ATTENUATION 


We shall employ logarithmic differentials in 
the computation of this variance error. The co- 
efficient of correlation corrected for attenua- 
tion ismost commonly derived from three observed 
values: (1) the correlation between the scores 
upon two tests; (2) the correlation between sim- 
ilar halves of the first test, i.e., the half 


m 
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test reliability, and (3) the half test relia- 
bility for the second test. We designate the 
variables involved as follows: А 
х, апа х, аге the similar halves of x,, the 
first measure. 
x, and х are the similar halves of х,, the 
second measure. 
ху = х,+ XQ; OXQ7XQXQj o У ‚ох, апа 
х, correlate approximately equally withoth- 
er variables, as do also x, and x,. 


For convenience we employ such units that V,- 
Ист» 
V, = 2+2г,,; И, = 24275; With these units 
covariances are: C4, = 13,3 Суб = Tugs 
Cy, = Wltr,, УНР rj, = Сб ису 
4с €4176,471*r,,; Cya Cga l1tf з €557€55 
7c,47€447204,701,/2 
In the formula for the coefficient corrected for 


attenuation 
f12 


B E 
ОУ revere 


the r, and r, are not observed values but Spear- 
man-Brown formula [11:10] stepped up values. 
Before developing a formula for its variance er- 
ror we must express г, in terms of the observed 


items, thus 


ro ==—————— ==... 3:8 


ltr; s ltr, 
-1 S 1 


3 23 ЁЭ 
127 tio (145) i (Ltr 55) (rye)? Clty, ) 03:86) 
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Taking logarithmic differentials of [13:86], 
substituting statistical deviations for differ- 
entials, and computing the variance, we obtain 


AV, V y 
Loy Dios NETS "35 
= + + —— 5 
2 2 2 2 2 
I rig 35 4 4( Im) 
V, c (e С 
t46 712735 712735 712^ u6 
pr e uL зге аа ы MET 
2 
4@+г„) гу 35 тату) гугу 
[13:87] 
С 14 с с 
К 12" u6 T35 35746 35746 
csset rues S CI ee фы SNO 


гг) 2г,,(18г,,) 2r Руб 2r, ‚(1+г„) 


с © 14 
"35746 $ 735^ u6 Tu6 


20m.) 2 чи) — 2r, 87,4) 


If we now substitute variances of correlation 
coefficients as given by [10:46] and covariances 
as given by [13:134] we secure, after some sim- 
plification 


2 
r 
4 
(А м [13:88] * 


r Фу r? 
2Y A(N-2) Tio T35 lug f45 146 


Variance error of Гу when obtained via [13:85] or [13:93] 


By a similar process we find that if a cor- 
rection for attenuation in one variable only is 


* в. Babington Smith has kIndly checked this derivation. 
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made so that 


Coefficient of correlation 


I 
= 12 t 4 
r = — 12 corrected for attenuation in а 
iy one variable only [13:89] 
2r, 
1+6 


Тһе уагїапсе еггог 15 


2 
25922712 47:52 Б 5 
=— +— -— 4+ -— . 
22 Я о ) [13:90] 


Variance error of гү, when obtained vía [13:89] 


A way equally excellent to [13:85] for util- 
izing all the data to obtain ris by the use of 
Yule's formula. : 

?; РР ОА 


any Е 
35 Г чб 


G. U. Yule's formula [13:91] 


for r 


The writer has proven that to the degree of pre- 
cision given by terms involving 1/(№-2) the vari- 
ance error of r „ thus determined is the same аз 
of гу determined by [13:85]. 


SECTION 10. THE EQUI-PROBABLE AND THE MEAN RANGES FOR 
SAMPLES OF DIFFERENT SIZE DRAWN FROM A NORMAL POPULAT ION 


We let the scores, as deviations from the mean 
in terms of the population standard deviation, be 
designated x, From Tippett's table (1925) (see 
also Pearson, 1932) we can find the probability 
that the value x will be the largest observation 
for a sample of № (Tippett's tables cover samples 
of N=3, 5, 10, 20, 30, 50, 100, (100), 1000). 
Let us choose a small interval (the writer 


*stanley Smith Stevens, H. L. Long, and E. В. Stabler have 


kindly checked this derivation. 
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chose i = .2а) and designate the probability of 
x in the interval (x-.5i) to (iyo) Е, as 
readily computed from Tippett's tables. Let us 
arbitrarily choose some range, ra. Let Pora ~ 
the probability that a measure drawn at random 
from the normal population < (x-ra). This prob- 
ability is immediately available from a table of 
normal probability functions. Let p, = the 
probability that a measure drawn at random < х. 


Then (p,-p,.,,)/P, is the probability that in a 
sample whose largest measureis x a second measure 
> (х-га), and the probability that the (N-1) 
measures other than the measure x >.(х-га) is 
Сор в, Thus the probability that 
the range in this sample of № > ra 13 


belt cpu, /p,1'* D 


The frequency of occurrence of this situation is 
Е. Finally we sum f,(.-[(p,-p,.,,)/p,]" ^) for 
all successive values of x to obtain the proba- 
bility that the observed range will exceed ra. 
We can repeat this process for different values 
of ra until we experimentally find that value 
for which the probability 7 .5. 

To illustrate in the case of N-10. With га = 
3.1о the computation described yielded p=. 46240, 
the probability that a range > 3.1 c would be 
obtained from a sample of 10. Then ra = 3.00 
was investigated and the probability of a range 
> 3.0 с was found to be ‚51228. Linear inter- 
polation between these values to find the range 
for which p = .5 gives 3.0246 c, the equi-prob- 
able range as recorded in Table XIII F herewith. 
The mean rangeisa little greater than the equi- 
probable range. Its value, as given by Tippett 
is recorded in the last column. 


| 


© 
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ТАВГЕ ХИТ Е 


THE EQUI-PROBABLE RANGE, AND THE MEAN RANGE, ОР 
SAMPLES DRAWN FROM A NORMAL POPULATION 
(in terms of the population c ) 


N EQUI-PROBABLE RANGE MEAN RANGE 

(FROM TIPPETT) 

6 2.46 2.5344 

10 3.0246 3.0775 

25 3.90 3.9306 

50 4.47 4. 4981 

100 4.9683 5.0152 

300 5.70 5.7555 

1000 6. 4379 6.4829 
10000 7.67 
100000 8.78 


SECTION 11, THE OPTIMAL SIZE OF INTERVAL FOR HIS- 
TOGRAM OR FREQUENCY POLYGON 


The temperature data, Table IV J, are not 
finely enough graduated to illustrate well the 
issues involved in connection with the interval 
for graphic portrayal, so we will deal with the 
heights (hypothetical) of 100 American male 
white adults recorded to the nearest .01 of an 
inch, Table XIII Е. With such fine grouping, 
few if any classes will have a frequency greater 
than l. A plot with this very fine grouping is 
shown in Chart XIII V. 


CHART XIII V 


50 55 60 65 70 


532 SUNDRY ISSUES -[13:91] 


which, all will agree, is quite uninformative. 
If we group in intervals of 5 inches, it is 
as shown in Chart XIII VI, which again 1s not 


CHART XIII VI 


50 55 60 65 70 


very informative. We clearly desire to group 
using some interval greater than .0] inch and 
less than 5 inches. If we group using .5-inch 
intervals the curve of Chart ХИТ VII results. 


CHART XIII VII 


50 55 60 65 70 


То the trained statistician, aware of the fluc- 
tuations to be expected from sampling, this 
might be quite satisfactory. However, graphic 
portrayal is essentially a popular device and is 
supposed to convey information to the statisti- 
cally initiated. If such a person sees Chart XIII 
VII he likely will interpret as significant all 
the small fluctuations from class to class. Не 
certainly has по standard as to where to draw the 
]ine between significant and insignificant fluc- 
tuations. This is in fact a difficult line for 
the well-trained statistician to draw. The ori- 
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ginal measurements, we will say, are as given in 


Table XIII G. 


52. 63 
53.30 
53.47 
53.91 
54,09 
54.35 
54, 72 
54, 78 
55.07 
55.17 
55.43 
55.81 
55. 89 
55.96 
56, 08 
56. 11 
56. 19 
56. 23 
56. 34 
56. 36 


56.45 
56.59 
56.67 


56.68 | 


56.77 
56.81 
56.86 
56.91 
56.95 
57.11 
57.13 
57. 17 
57.24 
57.27 
57.36 
57.43 
57.48 
57.49 
57.53 
57.63 


TABLE XIII 
57. 68 
57. 76 
57. 77 
57.81 
57.84 
57.90 
57. 98 
58, 00 
58.04 
58. 09 
58.13 
58. 20 
58. 25 
58.27 
58. 33 
58. 36 
58. 41 
58.51 
58.55 
58.58 


G 


58. 67 
58.73 
58. 74 
58.81 
58. 87 
58.97 
58, 98 
59.06 
59. 13 
59.19 
59.26 
59.32 
59.39 
59.44 
59.50 
59.5! 
59.63 
59. 69 
59.71 
59. 84 


59. 84 
59. 91 
59.96 
60. 13 
60.30 
60. 37 
60. 45 
60. 67 
60.81 
60.83 
60, 93 
61.09 
61.37 
61.59 
61.61 
61.75 
62.41 
62.58 
62.87 
63.03 


We therefore seek a rule for coarseness of 
grouping for graphic portrayal which will result 
in a frequency polygon which, probably, will not 
be misleading to the untutored reader who inter- 
prets the directions of the frequency fluctua- 


tions shown as meaningful. 


If the graph looks 


like that of Chart XIII VIII, we desire that the 
CHART XIII VIII 


Ee 


X, Xa Хз Xa 
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reader have at least an even chance of being 
right if he believes that in the population the 
frequency in class х, exceeds that in x, and 
falls short of that in x,. Of all the possible 
comparisons between class frequencies we may say 
that the edict that f, is greater than f, or f, 
is the one most likely to be made, and the one 
above all others that we should seek to make 
trustworthy. 

The foregoing is not quiteas refined a state- 
ment of the problem as is necessary because it 
does not cover a case that approximates that of 
Chart XIII IX. Here the naive judgment is that 
the mode lies between X, and x,. We should hope 
that this judgment would be one having a 50:50 
or better chance of being right. 


CHART XIII IX 


Е 


хр Ха X3 X4 


We willnowstate specifically the problem set: 
It is to determine the coarseness of grouping to 
employ so that the observed mode (intermediate 
between twoclass indexes if the naive judgment so 
places it there) of a sample of N drawn from a 
normal population has a probabilityof .50f lying 
in an interval having the true mode as its class 
index. If we can determine this interval, then 
we may conclude that the mode shown in a graphic 
representation using this interval, of a sample 
drawn from a normal population, will have a prob- 
ability of .5 of not being in error by more.than 
one-half the interval used, i.e., Р.Е. 255, 11 
the parent population is normal. $ 
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Except for the infrequent samplings yielding 
a greater frequency in a class removed two or 
more intervals from the true modal class than in 
the modal class, this statement is equivalent to 
the following: Given a normal population grouped 
in classes, with interval i for convenience ex- 
pressed in terms of c as the unit, and with the 
true mode as the class index of one of the 
classes, then the desired interval i is such that 
the probability is .5 that, in a sample, f, is 
greater than f, and also greater than £ 

lf f,>f, then (5,-5,) 20, and if f, > f 
then (f,-f,) > 0, and in a scatter di agram ud 
axes (f,-f,) and (5-6) the relative frequency 
above the line f,-f, = 0 is the probability that 
f,» f,, and the relative frequency to the right 
of the line f,-f, = 0 is the probability that f, 
» f,, and the relative frequency in the upper 
right sector is the probability that f, » f, and 
also > fas 

Specifically employing an interval, i, let us 
be given three successive classes, with sample 
frequencies f}, Го» and f}, such that the class 
index of the second class coincides with the 
parent normal population mode. Let N = the num- 


ber in the sample. Let F HU and Gy in which 


CHART XIII X 


5 
ял 
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By = b, be the population frequencies in these 


three classes. In the population f, is the 
largest, but because of the vicissitudes of sam- 
pling, f, may not be the largest in the sample. 
The greater the interval the more likely it is 
that f, will be the largest. We shall determine 
the interval, associated with a given N, for 
which the probability is .5 that f, is the larg- 
est. We shall consider this the desirable in- 
terval for graphic portrayal because then, ex- 
cept for the small chance that some f, > f; or 
Ё, ог £j, we have at least an even chance that 
the sample mode will not differby more than half 
an interval from the true mode. Since it would 
only be by rare chance that the class index of 
the second class would coincide with the popula- 
tion mode, wemust consider the sample mode to be 


CHART XIII XI 
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that obtained by smoothing the frequencies in 
the four classes constituting the modal group 
for the sample, by some such process as [7:21]. 
and [7:24]. 

The correlation surface Chart XIII XI will be 
approximately normal if the f's arenottoo small. 
Assuming that many samplings are made, the con- 
stants of this correlation situation are readily 
available. They are as given herewith, in which 
a circumflex over a symbol indicates a popula- 
tion value, and M, V, and r are the mean, vari- 


ance, and correlation. Also p los and а = 1-р. 


271 
КЕ ^ ^ 
PG, + 2pip, Pi 
25 = 
Р-Р Ee 


In terms of the standard deviation, the point of 
dichotomy at which f,-f, = 0 is ДЕА 
below the mean of the distribution. Similarly 
for the point in the other variable at which 
f,-f,-0. It is now а simple matter to determine 
i by interpolating between the tables by Lee et 
alia (Pearson, 1914) giving the proportionate 
frequencies in upper right quadrangles for di f- 
ferent dichotomies and different correlations. 
This has been done for certain values of М as 
reported in Table H herewith. 
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TABLE XIII H 


THE SIZE (in c units) OF THE INTERVAL GIVING A PROBA- 
BILITY OF .5 THAT THE FREQUENCY OF THE MEDIAN INTERVAL 
SHALL EXCEED THAT OF EITHER NEIGHBORING INTERVAL, IN 
THE CASE OF A NORMAL DISTRIBUTION; THE EXPECTED RANGE 
(in o units); AND THE NUMBER OF CLASSES NECESSARY TO 
COVER THIS RANGE 


: 1 Ка №. OF CLASSE 
N uid ЕТЕР = Ri 
6 „90562 2.45 2.72 
10 .81215 3.0246 3.724 
25 „67082 3.90 5.8! 
50 „58140 4.47 7.59 
100 450402 1.9683 9.857 
300 «40347 5.70 14.13 
1000: ‚31226 6.4379 20.62 
10000 «19490 7.67 39.4 
100000 «12499 8.78 70. 


The equi-probable ranges рі уеп in the third column 
are those derived in the immediately preceding 
Section 10, The valuesinthe fourth column, 
rounded off to the nearest integer, give the 
practical answer that we have sought, the num- 
ber of classes to employ in the graphic portrayal 
of samples of different sizes. The information 
of the last column is repeated in Chapter ГУ, 
Table IV M, in a form which is simpler to use. 

SECTION 12. DIRECT AND INVERSE INTERPOLATION 
There are many methods of interpolation where 
intervals are equal. A time saying method for 
direct interpolation is by means of tabled La- 
grangian interpolation coefficients as exempli fied 
in Tables XV A and XV В. 

There are various methods available when two- 
way interpolation is called for. One serviceable 
method is by successive one-way interpolations. 


"= сини 
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Satisfactory direct interpolation when inter- 
vals are unequal is more complicated. For one 
treatment of this matter, see E. T. Whittaker and 
G. Robinson, The Calculus of Observations, 1924, 
СЕ. 

Inverse interpolation occurs when the tabled 
values (unequal intervals) are used for purposes 
of entry and the answer sought is in terms of 
the variable (equally spaced) which is usually 
the argument. Linear inverse interpolation is 
simple, but if insufficiently accurate quadric 
inverse interpolation as later described may be 
used. 

a is the argument for which a value f is de- 
sired or in theinverse problem f is the argument 
for which a is desired. a,<a<a,. The interval 
in the equally spaced arguments is 7. 


d US ck MEME EU. . [13:92] 


The fractional distance of the а, to а, interval 
to reach a is p. 


а-а, а-а; 


БЕЗ Зоо Бал, EEEN 


81785 1 


For inverse interpolation 


t-t, 
p'- = also 9'= 1-р". [13:94] 
flt E 


When it is necessary to distinguish between а’ ѕ 
and f's for different p's they are designated a, 
and f,. When it is necessary to distinguish be- 
tween f's gotten by two-, three-, four-point, 
etc. interpolation they are designaced ЕИ 
t'', etc. When it is necessary to distinguish 
between a's gotten by two-, three-, and four- 
point inverse interpolation they are designated 
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EE 2 
тав БЕ 


а Uu 


a 


We define symbols as. in Table XIII I and as 
in the following defining equations. 


TABLE XIII I 
NOTATION FOR ARGUMENTS, TABLED ENTRIES AND DIFFERENCES. 


ARGU- | TABLED DTFFERENCES 
MENTS [ENTRIES] 1ST] 2ND | 3RD 4TH 5TH 6TH 
ORDER|ORDER [ORDER [ORDER |ORDER ORDER 


= ‘a lan i 

а, ty 

а, ti 

а, t; 

9, 5 

i. SN 


In connection witn inverse interpolation we 
require ^ and 8 following: 


^ir 
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AH 
^ 28 Oe Ce nix 56 ег 1 V) иза ба " 
AI [13:95] 
0 
-ALTI 
DES А rats Ceca se HS Об] 
АТ 
0 


We shall judge of the accuracy of interpola- 
tion by a parabola of a certain degree by com- 
paring the result obtained thereby with that ob- 
tained by using a parabola of the next higher 
degree. This comparison generally leads to a 
slight, but only slight, overstatement of the 
error. 

The following formulas give successively 
closer approximations to the correct value f,. 


un = (1-р) Е +pt, Two-point, or linearly [13:97] 


interpolated value 


1 
Vit = аСТ) t 
ti р(1-р)(2 p) E; Dub = 


1 £,] Three-point, or [13:98] 


^ 9(2-p) 2 quadric value 


If р < .5 interpolate in the other direction 
using t,, Бо» and f. ;. 


1 5 -1 3 
ТУ 255(1-рд, = —t 
a Чан р )(2 Ор LEES E 


3 1 Four-point, or (13: 99] 


= t] cubic value 


1-р fi 2-p 


Formulas [13:98] and [13:99] are of the forms 
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i = оо ©, [13:98а] 


крсу уе осу; [13: 99а] 


t!“ =-с_,# 

р 
These c-coefficients аге known as Lagrangian in- 
terpolation coefficients. Tables giving them 
for interpolation up to eleven-point and for 
different values of p are available. See Chapter 
XV, Tables XV A and XV B and references given in 
Chapter XV, Section 1. 


A good approximation to the error in pu is 
NENAS Pinus Ду 3 У кА 
ЫН t; у 3m Е. it is (t, А ) and in 
ar AiG se 55 СЭР These errors are maximal 


when p = .5. In this case we designate them РЫ 
ара EU”. 


1 Е А 
Е!! iT +2t, a, t2 Maximal error in [13:100] 


linear interpolation 


1 
Е!!! СЕЕ +3t,-t,) [13:101] 


Maximal error in three-point interpolation 


3 
Е"* лой -2 7 4. +6 — 4t + £5) (13: 102] 


Maximal error in four-point interpolation 


For the Table XV C of the normal probability 
functions given in Chapter XV, Sectim 3, the 
Е!! and Е!!! values recorded at the bottom of 
columns give these maximal linear and quadric 
interpolation errors as determined from values 
approximately halfway down the columns. 

For inverse interpolation, i.e., the procedure 
for obtaining a knowing t, the following formulas 


hold: 
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-if = 


а ат Їр! Linear inverse interpolation [13:103] 
л Apiai 
айке а + ЦР MH ) 
2 *3A^*A* 


or approximately 


all! = а, + i[p' + .25p'q'A(2-3A)] [13:104] 
Inverse quadric interpolation 
Defining the error in inverse two-point in- 
terpolation as the difference between the three- 
point and two-point inverse interpolation an- 
swers, anc designating the maximum value of this 
error ЕНЕ же һауе 


Maximum error ín 


Bab ea al A inverse linear [13:105] 


interpolaticn 
Similarly, 


ЕР e ‚6625-1 [5] олло Ва. 


interpolation 


SECTION 15. THE MACHINE EXTRACTION OF SQUARE AND CUBE ROOTS 


A number of excellent methods are available, 
so the only question is that of expedition of 
process. 

The extraction of square root by the following 
procedure is very rapid: We assume a computing 
machine with no trick devices. It has a key- 
board, a product dial or register, and a rota- 
tion counter dial or register, which latter is 
sometimes called the multiplier, or quotient 
dial. To illustrate the process let us desire 
the square root of 123. 456. 

Set 123.456 in the product dial. 

Set ll. (an inspectional estimate of the an- 
swer) in the keyboard. 

Divide, keeping the quotient to 4 fieures 
(twice the number in the trial root) obtaining 
11222. 
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Make mental note of 11.11, the average of ll. 
and 11.22. 

Clear the rotation counter by multiplying by 
11.22, thus restoring 123.456 to the product 
dial. 

Change the keyboard to 11.11 and divide ob- 
taining 11.112151. 

The averageof this andll.11, namely 1.11076, 
is the desired root correct to eight figures, 
which is twice as many figures as the number 
which is the same in the second approximation 
(11.11) and the quotient (11.112151). 

Let us extract the cube root of 123.456. If 
a slide rule or brief table of cube roots, Chap- 
ter XV, Table D, can be used to get a first es- 
timate the process will be shortened by one or 
two iterations. The slide rule gives 4.98. 

Set 4.98 in the keyboard and square. The 
product dial reading is 24.8004, but this does 
not need to be remembered or recorded. 

The carriage is returned so that a rotation 
will register in the unit's place. 

The keyboard is cleared. 

The multiplication bar is depressed once, not 
changing the product reading, but increasing the 
4.98 by 1 to 5.98. 

The keyboard is set to be identical with the 
product reading. 

The subtraction is made changing the counter 
dial to 4.98 and clearing the product dial, 
Glance at the product dial to see that this is 
зо. 

Set the "division" lever. 

Multiply by such a number, namely 4.98, as 
clears the rotation counter dial. The product 
dial now holds the cube of 4.98. 

If this cube is greater than 123.456 set the 
"division" lever, or button, and if less set the 
"multiplication" lever. In the present problem 
this cube = 123.50599, so something must be sub- 


[13:108] OCCASIONAL FORMULAS - 545 


tracted from it. We accordingly set the "divi- 
sion" lever. 

Make sufficient addition or subtractions (in 
this case subtractions) to make the product dial 
reading - 123.456. When this is accomplished we 
find -.002016 (minus because the "division" lever 
is operating) appearing in the counter dial. 

One-third of this, namely -.00672, is the 
correction to 4.98. Accordingly the answer is 
4.979328. This is correct to twice as many fig- 
ures as the trial root was correct, i.e., it is 
correct to the last figure. Otherwise stated 
4.9793280 differs from 4.980 in the fourth sig- 
nificant figure so 4.9793280 is in error in the 
eighth significant figure. 

If the value at this point is not accurate to 
enough decimal places repeat the process using 
the improved estimate of the cube root. Usually 
two such computations will suffice to give eight 
figure accuracy. 


SECTION 14. OCCASIONAL FORMULAS 


Since the proportion, p, of cases in a class 
is the frequency, f, in that class divided by № 
and since, under conditions of sampling in which 
N is made constant from sample to sample, the 
cistributional characteristics of p are simply, 
those of f divided by the constant N, we can im- 
mediately derive statistics of p from those of 
f, as given in Chapter IX, formulas [9:02], 
[9:03], and [9:04]. We have Np=f; N?p?7f?; etc. 
so that 


Vos a Variance of a propo rtion[ 13: 107] 
V, (p), 200 [13: 108] 
3 N? 


Third moment of a proportion 
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pq(1*3(N-2)pq] 
№ 


U,(p) = [13: 109] 


Occasionally one needs the regression, the 
correlation, or the covariance, between the fre- 
quencies, f,, f,, ог the proportions, р,» p,, in 
two classes. Having the variances of the fre- 
quencies, formula [9:02], both regression and 
correlation become available when the covariance 
is known. We here derive the covariance, c(f,f,), 
between the frequencies in two classes by first 
deriving the regression of one of these frequen- 
cies upon the other. 

Let all classes except the two in question be 
considered as a single class, c. The sample fre- 
quencies are f,, f,, f,, such that f,*f otf, = М. 


The true, or population frequencies are © TX 
ЭР such that of +f, +f, =N, Letting A’s represent 
deviation in frequency from true values we write 
Е =Ё,+А, and РЕЖ. When A, is some positive 
number of cases there must be an exactly compen- 
sating deficiency in the other two classes and 
in the long run the deficiency that attaches to 
the a class will bear the ratio to the defi ciency 
that attaches to the c class that F, bears to E. 


That is, the regression of A, upon A, is given 
by the equation = 
-Ё 
А, о 
ISI 
This is the regression of f, upon f, and after 
haking simple substitutions the ао coef- 
ficient can be written 


SE Sup Regression of one cell 
b( 18. В) istos E frequency upon a second [13:110] 
d q non-overlapping frequency 
b b 
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From this, utilizing the appropriate variances 
and standard deviations, we immediately obtain 


е Р.Р, 
TCE E) ОЕ [13: 111] 
4.4, 
Correlation between non-overlapping cell frequencies, or 
proportions 


22-20 Covariance between non- 
c( 2217) я -Юр,р, overlapping cell fre- [13:112] 
quencies 


i PaP» Covariance between non= 
c(p,p,) у overlapping cell propor- [13: 113] 
N tions 


The preceding results may be applied to the 
frequencies, or the proportions, in a two-dimen- 
sional contingency table. In case one of the 
frequencies is a part of the second, this second 
can be split into two parts, one correlating 
perfectly with the first frequency and the other 
part having the correlation [13:111]. Making 
allowance for this we can obtain the covariances 
for all the type situations that maintain in a 
two dimensional contingency table. From these 
covariances one can readily compute the covariances 
between proportions and also the correlations 
between frequencies and the correlations between 
proportions. We designate the marginal totals 
for the rows f,, f,, etc., those for the columns 
f+, Ё etc., and those for the cell lying at 
the intersection of the a row and the a' column 
f,,', and similarly for cells at other intersec- 
tions. It is simple to establish that 

This being [12:112] In 

АО new notation 


Covariance between 
= the frequency in а 1 
e( f£, E) Np,,!9, row and the frequen- [13:114] 
cy in a cell in that row 
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Covariance between the [13: 115] 
- - еур Т 4 : 
г СО 
If the first variable is quantitative, having 
a mean, M,, and a deviation of row a from the 
mean, X,, then, as established by Pearson (1913) 


= Covariance between a 5 
cM f.i) = Paa'X, mean and a cell fre- [13: 116] 
quency in a scatter diagram 
If both variables are quantitative the true 
regression equation of X, upon X, is 
cA S И" 
X =b, X, +H, -b, M, 
This equation holds for every observed X, value. 
Thus for individual i in a sample of N we have 


— ^v ^ ^ ^o 
Xi; by, X4, tH, 7 bi, Й, 


If such equations are written down for all the N 
cases in the sample, added and divided by N, we 
have, by noting that 5Х,-ЮН, and ХХ,=ҮМ;, 

ten TUNES COME S A 

JW; Б,,И,+ И, - D, И, 
This establishes that the parameters in the re- 
gression of И, upon И, are the same as those in 
the regression of X; upon X,. Thus the regres- 
sion of means is the same as ‘the regression of 
variables, and it follows that the correlation 
between means is the same as that between vari- 
ables when samples are randomly drawn from a 
single parent population. The best approxima- 
tion to these true parameters are the sample 
values, yielding 


b(M.M,) = 


! 
o 


(13: 117] 
r(M, M.) ZME [13:1 18] 


1 
с(И.И,) ДУЛА ШЕ 217222 НЫ 
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If in an experimental situation, such e.g., as a 
sampling of public opinion, both r(M,M,) and гү, 
can be independently calculated and when so cal- 
culated are found to be different, it is estab- 
lished that the successive samples have not been 
random samples from the same parent population. 
We here compute a regression, correlation, and 
covariance between two variances. We consi der 


the population distribution and take ГА апа ГА 
as the arbitrary origins. We note that the mean 


d 
х2 for an array = = У, 2* Since X ТД? zb X, 


this yields the following regression of х? upon 


2 
X5: 


Summing for a sample of N, and dividing by №, we 
obtain 


Vo PV. uis EUR LESS UR 


in which 27,2029) /N for the sample of N cases. 
It is an unbiased estimate of V, which fact we 
may express thus: D 217 in which 0 is an un- 
biased estimate of 0, i.e., the mean of these 
for many samples approaches 0. Thus from [13:120] 


we get 


~ 


variances 


29 m 20 ү ression of 
B = 02, = 0,05 7e [13:121] 


2 


T(VjV,) [ad Correlation between [13:122] 


ж 
12 variances 


ое ЕА 
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For the best value of r2, based upon sample data 
we may use [11:114]. For the best sample esti- 
mates of V, and V, we may use [6:09], and for 


N- = 
1, = VY, and that 


У апа У, we note that Л 
N-1 (N~1)? 
А 


so multiplying the right hand member of [6:55] 


by (1X will yield the desired estimates of 


the true variances needed in [13:123]. 

Further, if we deal with situations in which 
N is large and in which we may assume that the 
parent bivariate population is normal, the pre- 
ceding formulas simplify, yielding: 


y 
Ий олы, кт: A (13:124) 
2 


N large and bivariate normal population 


цал? Е r2 N large and bivariate normat [13: 125) 


population 


c(V,V,) = ее. N large and bivariate погта![1 3: 126] 
N population. ЗЕЕ [13:18] 

Further variance, correlation, and covariance 
formulas have been derived by Karl Pearson (1913) 
and are here given: 

We employ the Р\, and the Pij notation as 


given in Chapter XI, Section 4. That is Рой: 
Poy} рүүсү; Poo Vi Poz Vz; etc. The vari- 
ance errorofany product moment, when N is large, 
that is, terms of' order 1/№?, 1/N? etc., are 
neglected, is given by 
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Е К АЛ 
КУ(Р:;) -P2;,2j Pij +1 Ро! 1 


‚2 2 "m 
+ 7 РооРү,ү-1 + 245 PiP, үрг, ү-1 


- 2 ОРЧ 19-1,Г27 Ру+1Р1,]-1 [13:127] 


(№ large) Variance error of any product moment 
from the means 
The covariance between any two product moments, 


each involving the same two variables, is given 
by 


N CP, 4P1;2 Ру, нэ) 7РеһР ; +8 1PooPo-1, hPi-1, ) 
+ В 7 Po Ру, н-зР1, }-1 * ÉJ PrrPo-1,nPi,j-2 
+ hi Р: Ро, -1Р1-1,; 7 Роза, Ри -1, j 


"JP. Р joa 7 2 Рїзї,19:-1,8 


Врү,ре1 Ру, n-1 (№ Large) [13:128] 


Covariance of product moments from the means (two- 
variables! 


Since formulas [13:127] and [13:128] pertain to 
deviations from the means, none of 6, h, i, or j 
may = 1. Formulas applying without this restric- 
tion as given by Karl Pearson (op. cit) are 
[12:129] and [13:130]. As in the two preceding 
formulas 6 and i refer to the powers of a first 
variable and h and j to the powers of a second 
variable. When the origins of these variables 
are fixed points we have the variance formula 
[13:129] and the covariance formula [13:130]. 
In each instance № should be ample. 
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N ИР) = Pop apo Pry ce + [13:129] 


№, c, VE 


=P 
ghe 1] 


P j 


Р a Ne(P,,P, 


841,84) ан 1] 
[13: 130] 
Special cases of the two preceding formulas 
in all of which terms of order less than 1/N are 
neglected, and certain further formulas are 
given herewith: 


ghi 


5 РИ) Assumption of normality [13: 131] 


(озо) cune. Assumption of normality [13:132 
Assumption of rec- к 
P35 2223) ту; бү 02 tilInearity only) [13:133 


Ү(8,с:) = 0 Assumption of symmetry - [13:134 


г( оу) = r(M,V) = By ІЗ 135] 
Jeni 


fe 
r(r;,04) au Assumption of normality [13: 136] 


us Tip VES. Е var 
AG a гаг eee 
Xl) DLE 
(1-. 2508, ууу из гб z i" 
“Гүр 
113: 137] 
r(r, duca 0 Assumption of normality [13: 138] 


Let a = M 7b, Р the constant term in a repres- 
sion equation, then 
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2 
c(ab,,) x Pow VE Assumption of погта!- [ 13: 139] 
N V, s ity) 
Involving three variables we have, 
T35(7357115754) * Ti (Ti57755754) 
(157259: 720 


Assumption of normality [13:140] 


r(04554) = 


= © 2. 2 
2N c(ri,r,,) roy Fok 5-2 Iri) 


2 2 2 E 
+гуогуз(гіз+гуз+г2,)] [13:141] 
Assumption of normality 


Involving four variables we have, 
N c(r35754) 5 Fash ayti ayia Tay 


~ Fy ышы 3473 47 T а ы 


1 
2 2 2 2 
Ети, „(түз+гү,+г;,+г2,) 

Assumption of normality [13: 142] 


Formulas [13:140], [13:141], and [13:142] were 
first given by Filon and Pearson (1898, pp. 229- 


211)2 
Romanowsky (1925) gives the following formula 


for the variance of a regression coefficient: 
220 
y (2552 Assumption of normality 


v, САЗ) ef bowa] [13:143] 


ИБ.) = 


Karl Pearson (op.cit) gives the following vari- 
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ance for the constant term in a two variable re- 
gression equation: 


Assumption of 


у= (M2 *V,) V(b, ,) normality [13:144] 


The following formula holding for any form of 
bivariate distribution if N is large for the 
variance error of r was first given by Sheppard 
(1898). 


a, 2 к, pA 
252 5422:5127 0% 252 Роч Ро 
и eae 
Pi 4 P2, 4 Poz 


? Р2> Р›оРо2 P351 Pii P20 


2 РьюРоо Рї1Роо 


Раз Р1зР 
-—^———^)....... [3:145] 
Ру1ро 2 


Formulas for a number of the preceding standard 
errors and correlations, not involving the as- 
sumption of normality, are given by Isserlis, 
(1916). In another article (1918) he gives re- 
duction formulas for higher product moments, in 
normal multivariate distributions such, for example, 
as for 


Р112(= Ex,x,x2/N) 


The eight following formulas, holding for 
samples from a normal bivariate population, are 


from Wishart (1928): 


|- 


se NUS A БУУДАГ =: 
ШӨЛ (N Leiz ог їп other notation ——p,,- Crp = Phy 
normal bivariate population [13:146 ] 


за 


ан "m 
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N*W(V,) = 2(N-1)V? see [6:58] [13:147] 


^ e [13:126 or- 
NM'(V,V,)- ANI 82, S04 GRE е [13:148] 


ulation) 


Normal bi- 


N*V(e, ,) = (NA XV, +52,) variate pop- [13:149] 
ula on 


Normal bivari 


N*c(V4c,,) = 2(N-1) Y, e ate population 13:150] 


Normal trivari-, 


N*c(V,c,;) *2(N-1) ©, обл ate population) (13: 151] 


обез сд = (NU ecu сс АЗ: 
Normal trivariate population) 

N*c(c,,0,,) ^(N-1)(6,,0,, t 6,,0,,) [133153] 
Normal quadrivariate population) 


Sundry uni-, bi-, and multivariate formulas 
for various moments and product moments, holding 
for small samples of any form are given by both 
Pepper (1929), and Feldman (1935). 


SECTION 15. SEQUENTIAL ANALYSIS 


This is a method wherein observations of the 
items in a lot are taken in succession until the 
cumulated evidence warrants the acceptance or 
rejection of the lot. This inspectional use of 
sequential analysis differs onlyinminor aspects 
from its use in experimentation and in quality 
control. 

We here consider its use in connection with in- 
spection of a product which can be classified as 
defective, ornon-defective, andinwhich the num- 
ber of cases in the lot is large in comparison with 
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the undetermined, but much smaller number, in- 
spected. Adaptation to the inspection ofa product 
having a graduated score as, forexample, tensile 
strength ofa piece ofmetal or scholastic standing 
of a college student is, in general, simple. 

The abscissa of Chart XIII XII is the number 
of items inspected and the ordinate is the num- 
ber of defective items. The broken line indi- 
cates the results as found after one item, two 
items, three items, etc. have been inspected. 
This sample number lineisalso a time line show- 
ing the amount of information given at each suc- 
cessive inspectional stage. All the information 
due to earlier inspections is inherent in the 
status shown by the last plotted point on the 
line. For any point, the ordinate divided by the 
abscissa gives the proportion of defective items, 
an unbiased estimate of the true proportion in 
the lot. Of course this estimate is the more 
reliable the more items are inspected. 

The proportion of defectives found at any 
stage of inspection is an unbiased estimate of 
the proportion maintaining in the lot, but when 
the number of items inspected.is small the esti- 
mate is relatively quite unreliable, particular- 
ly when the proportion is very high or very low. 

Prior to any knowledge of the actual number of 
defectives foundafter n-inspections, the ordinate 
at n can be. divided into three parts: an upper 
portion wherein the number of defectives is so 
great that the lot is rejected, a middle portion 
wherein decisionis reserved, and a lower portion 
wherein the number is so small that the lot is 
accepted. This statement holds after the first 
few inspections, for in practical situations a 
final decision either way would mot be made upon 
the basis of one or two or other small number of 
inspections even if all the items inspected were 
perfect or all imperfect. Since the division of 
every ordinate into these three partsis possible, 
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we obtain, by connecting homologous points, the 
three regions shown on Chart XIII ХИ. It is 
necessary tolay down specific conditions so that 
the lines demarking these regions may be speci fi- 
cally defined. Two lines, those for d, and d,, 
which, conveniently, turnout to be straight lines 
have particular merit. These lines are functions 
of ру, Py, a, and В defined as follows: 

D, is a proportion of defectives so small that 
a lot having this proportion is considered ac- 
ceptable, 

a is the risk of rejecting a lot in which p =p}. 
This risk is generally small foritis considered 
highly undesirable to reject so excellent a lot. 

P, is a proportion of defectives, greater than 
ру, of such size that a lot having this propor- 
tion is considered unacceptable. 

8 18 the risk of accepting a lot inwhichp =p,. 

d tye for ea 

The operating characteristic or OC curve, as 
shown in Chart XIII XIII, indicates the relation- 
ship between р;, Р,» a, and P. The ordinate L, is 
the relative frequency of acceptance of a lot 
having p proportion of defectives under the par- 
ticular sampling plan chosen. The ideal OC curve 
would be^. shaped. There would be some value р” 
(p' = s as given by [13:156]) such that all lots 
having a smaller p would be accepted indubitably. 
Such a situation can be brought about if the in- 
spection plan calls for sampling the entire lot. 

It may be observed at this point that standard 
statistical procedures antedating the development 
of sequential analysis, would endeavor to estimate 
from a sample the value of p in a lot, compare it 
with a standard proportion s, co-determinate, as 
shown later, with р; andp,, and draw a conclusion 
as to acceptance or rejection. Їп standard sta- 
tistics each of the two hypotheses can be tested 
independently. In sequential analysis a decision 


NUMBER OF DEFECTS 
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CHART XIII XII 


REGIONS OF ACCEPTANCE AND REJECTION 


m 
/ 
57 
А Ч 
28 Reject (© Rm region) 
7 
/ 
7 
aneo — 
Reserve Decision (© Вт region 
amm  — — — 
* EJ = E) 3 E] % 
Accept (Ф R'm region) 


NUMBER OF INSPECTIONS. 


between the two is forced and must be accepted. 
Properly used, there is no systematic difference 
in outcomes, but ithas been pointed out that the 
sequential analysis method will generally result 
in savings in the number of cases sampled in the 
neighborhood of 50 per cent. The question of the 
task placed upon the executive who decides upon 
the sampling scheme is also pertinent. If it is 
a sequential analysis scheme, he must choose, 
from a priori considerations, the crucial values 


` Py» Po, а, ава, whileifitis a standard scheme 


he must choose s and acceptable probability limits 
both for p below and for p above s. 

The bounding lines d, and d, of Chart XIII 
XII are functions of A, and h,, which are func- 
tions of p,, Р,, а, and B, and of s which is a 
function of p, and p, only. 
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log 20) is 
hy E i [13:154] 
Deep -р 
108-2(-5) log (—+) 
Pı 1-р, р, 
1- 
log CA as 
TE = а ПЕВ] 
Jose Cae о ССР. 
5026 "1-р 
10 Gen) 
= [13:156] 
log 5 12525 
Р, ГР. 
dj S shy eni. ао сайаг АЗОТ 
ove whi 1 
dj chy + an’ гаи ragten А 0187591 


The proof of these formulas is given in the 
revised report of the Statistical Research Group of 
the Applied Mathematics Panel (1945). Other im- 
portant contributions to the early development 
of sequential analysis are Neyman and Pearson 
(1936), Hotelling (1941), Bernbaum (1942), Bartky 
(1943), Wald (1943), (1944 general, (1944) cumu- 
lative), (1945); Freeman (1944), Wald and Wolfo- 
witz (1944), (1945). 

The bounding lines of the chart сап be plotted 
prior to the inspection of the first item. If 
upon inspection the first item is perfect, the 
point with coordinates (0,1)isplotted, while if 
defective, the point (1,0) is plotted. In either 
case, the plotted point lies in the region indi- 
cating further inspection as necessary, so a second 
observation is taken and the outcome plotted and 
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the process is continued until a plotted point is 
obtained in the region of rejection or in that of 
acceptance, thus terminating the inspection. 

Alternative to this graphic procedure, one may 
draw up a table giving, foreach successive value 
of n, a value of d abovewhichlies rejection and 
a value of d below which lies acceptance. 

The operating characteristic curve (the OC 
curve) of Chart XIII XIII is derivable from the 


CHART XIII XIII 


Lp 00 CURVE 
Ding - 
8 
E 
А 
s ---. 
Че Tatha atb 
P | 
а Е 
2 2 4 6 в 


10 Р 
properties of Wald's (1945 sequential) probabil- 
ity; or likelihood, ratio L. 


pele 
bee ee 3:150 
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This is the ratio of the probability of the ob- 
served data arising if the second hypothesis, 
viz. р = p,, is true to the probability of its 
arising under the alternate hypothesis, viz. р=р;. 

Two constant values of L, which we designate 
А and 8, exist, which divide the m-dimensional 


space m, (м = 1,2,...,n) into three mutually ex- 
clusive and exhaustive parts, Ri; Res and R,- 


Corresponding to the divisionof the m-dimension- 
al space is a division as shown of the space in 
Chart XIII XII. If the sub-samples consist of 
one observation each, the m dimensions are the in 
observations, or scores, 0 or 1. RI is the re- 
gion in the m-space wherein the p =p, hypothesis 
is tenable with a risk not greater than a. Ё? 
is the region wherein the alternative hypothesis 
р-р, is tenable with a risk not greater than £. 
К, is the remaining region and herein neither of 
these hypotheses is tenable with the small risk 
assigned to it. 

The sequential analysis is terminated at the 
smallest value n of m for which the Ууз point 
lies either in Rl or RES If in R;, we accept 
the first hypothesis айа if in R2, we accept the 
second hypothesis. 

We may take the following as the values of 4 


and B: 
aT E ra ТУ 


В = «ке» 13:16] 


These are close approximations if a and # аге 
small and the approximation is such as not to 
decrease the strength of the test, but perhaps 
increase very slightly the necessary number of 
observations. The logarithms of А End 1/8 are 
needed so we write 
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а = log А = lose TOY TERR [13:162 
ч; = log == log = Note [13:171] [13:163] 
It will be useful to note that 
Sa se 13:164 


If, in Chart ХИТ XII, an ordinate is drawn 
at any point n and bounded by the lines d - 0 
and d - n, it is divided into three parts. The 
ratio of the part lying in the reserve decision 
region to the sum of the parts in the other re- 
gions may be computed. This ratio obviously de- 
creases as n increases, which is to say that, 
for practical purposes, if the sequential analy- 
Sis is carried far enough, a point outside the 
reserve decision region will be attained, thus 
terminating the problem. 

We can note that if, aftera very large number 
of observations, a point inthe reserve decision 
region is attained, the ratio of the number of 
defectives to the number of observations approaches 
the slope of the lines bounding this region, 
thus this slope, s as defined in [13:156], is 
also p', the standard value of p between p, and 
P,, for which it is indifferent whether the lot 
is accepted or rejected. Note that s is a func- 
tion of p, and p, only and not of the risks а 
and 8. 

Reference to Chart XIII XII shows that the 
value of n given by the intersection ofd-h, +51 
and d - 0 gives the minimal number of observa- 
tions which could lead to the acceptance of the 
lot. The value of n given by the intersection 
of d-h, + s n and d =n gives the minimal num- 
ber of observations which could lead to the re- 
jection of the lot. Ordinarily some number of 
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Observations greater than either of these is 
necessary to reach a conclusion. The average 
sample size, п, necessary toreach a decision, 
forms a distribution somewhat as indicated in 
Chart XIII XIV. We let the average sample number 
(ASN), required when p - p,, be designated no 


and when p = Pz, we designate it n, . These may 


be the average sample numbers uh which one 
is most concerned but, since the lot p is likely 
to lie between р, and р,, a generally more in- 
formative ASN is n,. Certain ASN values are 


CHART XIII XIV 


ASN CURVE 


Tp 


пр 
204 м. 
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8, 
[si 
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approximately as follows: 
(їл =a h, 


no [13:165] 
Pi $ > Pp, 
(ЕВЕ, - вв 
S1... [3:166 
Po pas бА 


А good idea of the entire ASN curve can be 
gotten by plotting five points, n. , п , and 


. Pi P2 
the following: 
nd doses [13:167] 
e h 
о 27.10: [13: 168] 
Ines 
A 
B, 22172... . . [13:169] 
* s(l-s) : 


D, is not the maximum point of the curve, but as 
indicated in Chart XIII XIV it lies near this 
maximum point. 

The cost and time arrangements in connection 
with an inspection scheme should be sufficiently 
flexible to meet a Situation in which the actual 
number of inspections required to reach a con- 
clusion is 2.5 or 3 times as great as that indi- 
cated by the greater of 2 and п, 


The sequential analysis just discussed is 
that known as the "binomial case""because the 
Scores, defective or non-defective, yields a bi- 
nomial distribution. We here give a numerical 
illustration of this case and then discuss the 
normal (wherein the distribution of scores is 
normal), the Poisson, and other cases, 

Illustrative numerical Problem. A lot of 
10,000 bolts is to be inspected to determine 


113: 170]- SEQUENTIAL ANALYSIS 565 


whether it shall be accepted or rejected. The 
Standards set by the accepting agency are: 


Pı = .05, i.e., if not over 5 per cent are 
defective with respect to the charac- 
teristic in question, the lot is ac- 
ceptable. 


a = .02, ive., the risk of rejecting so 
good a lot is not greater than .02. 


р, = .15, i.e., if over 15 per cent are 
defective, the lot is not acceptable. 


B 7 .04, i.e., the risk of accepting so 
poor a lot is not greater than .04. 


We compute by [13:154], [13:155], [13:156], 
[23:169], 13:157] 5 (23:1581. 


= 2,64 

13.20 

s = .09194 

п, = 101 2 times 101 is judged not to 
be a prohibitive number of 
inspections. 

d, = 2.6 + .092 п 

ар 232+ 092 п 


2 


As a result of inspection, the performance curve 
shown in Chart XIII XII was obtained, leading 
to rejection after but 17 bolts were inspected. 
Since 17 is considerably smaller than 51 (= За Л, 


it is probable that the actual percentage of de- 
fectives is in excess of 15. 
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The tabular treatment of Table XIII J is 
equivalent to this graphic procedure. Columns 
n, d,, and d, are drawn up prior to the first 
inspection. The recorded d, values are the in- 
tegral values next higher than fractional values 
ordinarily given by [13:158] and the recorded d, 
values are the integral values next smaller than 
fractional values givenby [13:157]. The entries 
in column d are recorded ав 1, 2, 3, ебс,, in- 
spections are made. Since, after 17 inspections, 
a value in the d column is attained as great as 
the value in the d, column, the lot is rejected. 
If the lot had been a much better lot, a value 
would have been attained, after a certain number 
of inspections, as small as the value in the d, 
column and the lot would have been accepted. 

The normal case (involving a one-sided alter- 
native test of the mean): If the score, X, at- 
taching to an observation is a normally distri- 
buted score with known standard deviation, о, (and 
variance V) a precise sequential analysis test 
of the difference between the two means is avail- 
able. We have 


M, the smaller mean 


M, the larger mean 


a the risk of believing a lot to have а 
mean of M, when it in fact has a mean 
of M, 

В the risk of believing a lot to have a 
mean of И, when it in fact has a mean 
of M, 


We here desire, for purposes of acceptance, 
lots whose means exceed M, and this is a one- 
sided alternative. Had we desired lots whose 
means neither exceeded nor fell short of M, by a 
certain amount we would have had a two-sided al- 
ternative, a readily soluble problem (SRG Report 
255, Sec. 5, 1945). 
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Let a and b be the following natural loga- 
rithms: 


1-8 


ая 


Note [13:162] [13: 170] 


E l-a 


Note [13:163] [13: 171] 


We have for the indifferent mean and also for 
the slope of the acceptance and rejection bound- 
ary lines 
И. tM, 
pore ооо ИАА [13:172] 


The intercepts of these boundary lines аге h, 
and -hi. 


[goce EN [13:173] 


ОЕ .. [13:174] 


The acceptance and rejection boundary lines are 
given by [13:157] and (13:158]. The operating 
characteristic curve (a left-right reflection 
of a Chart XIII XIII curve, with abscissa labeled 
M instead of p) is 


sese. 
DER DE S. [13:175] 
et2 -1 Е 
in which 
D GIRO D а [13:176] 


t, = 2 (s-M) (hh), . . , [13:177] 


L, is the relative frequency of accepting a lot 
whose mean quality is M. L, is of indeterminate 
form when И = s, but it can be evaluated. We 
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hy 
о Ооо = 32118] 
В+ h, 


Average sample numbers are readily obtained. 
САРЫ 
ey и = в 


СЗ: 


ее 


в AGL ey ie у 
m, Я теми [13: 181] 


(H, = My)? 
2 _ 2(1-2a- 
л сые к 
2 (и, - "NC 


We need criteria for the acceptance of M = M, 
and of M = M,. Since we have a normal distri- 
bution the ratio of the likelihood that the sam- 
ple in question would arise under the hypothesis 
M = M, to its likelihood under the alternative 
hypothesis И = M, is simply, 


1 
exp ту Х(Х-н,)7 
L= с 1135183] 


Taking natural logarithms, which does not dis- 
turb the relationships [13:162] and [13:163], we 
can readily obtain the following criteria: 


2 > Иа 
TS TET 


Accept M = И, [13: 184] 


2 1 
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SEK 
Has шлан Accept И! (13:1851 
1 
#7775 y : 
«(5Х-нв) МЫ" 113:1861 
DEA “нь 1 testing 


The actual solution may be carried out graph- 
ically bymeans of such a chart as XIII XII or by 
tabular methods, recording, for successive n's, 
the values of 2X which lead to the acceptance o! 
M =M, and again of M = #,. 

Further applications of sequential analysis: 
The method applies to many situations of which 
the following will be noted: 

Double dichotomies:  (SRG 255, Sec. 3) re- 
vised, 1945) Herein the problem is that of 
choosing between two processes, each of which 
leads to one of two results. Ап observation 
(random) of a first process case is compared 
with an observation (randem) of a second process 
case, with one of three outcomes, (a) first pro- 
cess superior, (b) first and second processes 
equal, or (c) first process inferior. This is 
continued until the sequential analysis test 
shows one of the processes to be superior, with- 
in the risk limits set. 

Test of variability: (SRG 255, Sec. 6 re- 
vised, 1945) The question here is to decide 
whether the standard deviation of a submitted 
lot differs (within some small assigned risk) 
from some standard, or acceptable, standard de- 
viation. The solution is dependent upon a normal 
distribution. 

Test in the case of a Poisson distribution: 
(SRG 255, Sec. 7, 1945) The chief hazard here 
lies in the assumption of a Poisson distribution. 
The experimental determination that a Poisson 
distribution is present is in general far more 
difficult than the sequential analysis procedure 
that follows therefrom. 


CHAPTER XIV 


MATHEMATICS, THE MENTOR OF 
STATISTICAL LNGENUITY 


Section 1, INGENUITY IN RESEARCH 


Cleverness in laboratory technique, in utili- 
zing unique properties of materials, and in mea- 
suring differences rather than gross magni tudes 
has resulted in many of the great discoveries of 
modern physical, biological, and even social 
science. Another field for ingenuity has been 
relatively neglected. It has been a tacit as- 
sumption that the elementary associative and 
distributive laws of mathematics hold. Gener- 
ally this has proved serviceable and where it 
has not the disproof of the assumption has fre- 
quently been the most direct way of discovering 
some important truth. There are innumerable 
relationships between and within mathematical 
functions and unique to the particular functions 
employed, which have not been exploited in con- 
nection with phenomena. It seems that it has 
not been customary to look to mathematics for 
the key to material and biological behavior. 
The writer believes that mathematical relation- 
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Ships are never the exact key and that living 
behavior is always more complex and always con- 
ditioned by more factors than are included in a 
mathematical formulation, but nevertheless such 
formulation may be an approximate key and one of 
the richest sources for suggestions as to life 
behavior. 


A striking illustration of this principle is 
to be found іп the properties of Latin and Graeco- 
Latin squares and in the design of experiments 
as developed by В. A. Fisher (1937) and his 
School. With plot designs consisting of rows 
and columns and equal numbers of observations in 
each cell, thereis zero correlation between the 
frequencies in the rows and the frequencies in 
the columns. This is a property discovered 
through the study of mathematics. Agricultural 
experimentalists have been shrewd enough to in- 
corporate this principle in their experimental 
designs and they have obtained an economy of 
procedure and a certainty of conviction not pre- 
viously possible. It is in this sense that 
mathematics is to be thought the mentor of sta- 
tistical ingenuity. The use of the properties 
of the normal distribution, the principles of 
sequential analysis, the merits of independent 
variables as availed of in the analysis of vari- 
ance and in multivariate analysis into principal 
components, are a few additional illustrations 
of illuminating transfer from mathematics to 
living phenomena. 


There are innumerable other mathematical prop- 
erties which have never been availed of in ex- 
perimental design and procedure. In the follow- 
ing sections are given a number of mathematical 
functions having unique properties which may be 
suggestive of phenomenological relationships. 
This chapter is intended as a background— though 
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all too brief—for the experimentalist so that, 
if his data do show some of the symptoms of say 
a particular mathematical function, he can then 
seek out all the invariant properties of the 
function and return to his data with the hypoth- 
esis that it has parallel invariant properties. 
Mathematics should be more than a cultivator 
that is dragged behind a lead horse,—it should 
be the horse itself, and frequently a spirited 
one, that chooses the ground that is to be cul- 
tivated. 


sECTION 2. MATRICES AND DETERMINANTS 


A few facts about matrices and determinants 
will assist in understanding many theoretical as 
well as applied problems in statistics. A matrix 
is a rectangular array of terms, called elements. 
It does not have a quantitative value. A matrix 
may be designated by a single letter (usually 
script capital @, В, etc.), by a complete re- 
cording of all its elements, by a partial re- 
cording with dots to indicate obvious non ге- 
corded elements, by a designation of the general 
element a;;, or thus— ЭЙ Also in the 
case of a square matrix, by giving the terms in 


the principal diagonal, thus |25 а, ару 
аы атаа 
Spacey а, 

аа БН | Чою [14:01] 


акт Ако eee Aky 


The double rules аге used to distinguish between 
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a matrix and а determinant. 

A matrix having k rows and 4 columns is of 
order k4. 

Two matrices are equal only if each element 
of the one is equal to an element in correspond- 
ing position in the other. 

Matrix @’ derived from matrix 0 by inter- 
changing rows and columns is called its trans- 
pose. Е, в. 


а а а 
т TEE: 
@ = 
S ВА 955 
апа [14:02] 
тт Вот 
зе 
d = ра, а,, 
inca acies 


The adjoint of a square matrix 0 is the ma- . 


trix whose elements are the cofactors (defined 
later) of the elements of 07. 

А square matrix is symmetric if and only if 
it is equal to its transpose. Thus an element 
ар, must equal its conjugate а, | + 

The addition of matrices: Jf three matrices 


0, B, and C, having elements a,,, бүү» and сү, 

are such that C is the sum of (i and 8, 
0-0-8 

then сү, = а, + biis these elements being added 


in the ordinary algebraic sense. 

Matrix multiplication by a scalar: If the 
matrix ( is multiplied by a scalar (a real num- 
ber, as distinguished from a vector) b, then 
every element in @ is multiplied by 6, thus, e.g. 


а, 435 833 ba,, bai, ha, 
ü- and bG= Gb- 
аз аз Ay ba, 5a,, ha 


[14:03] 


CDP-——r TIR RN 
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The multiplication of matrices: With matrix 
theory meanings attached to the terms pre- and 
post-multiplication, let the matrix C be equal to 
matrix @ post-multiplied by the matrix В. The 
element in ith row and kth column of C is equal 
to the sum of the products of the successive ele- 
ments in the ith row of 0 with the successive 
elements in the jth column. E.g., let 


ау 845 ау, ау 
@ = s: 
85, 825 423 Foy 
and 
В = [51 4), 
21 922 
RED [14:04] 
then : 
ay yb, yay Рот аузу а,,В, жа, 26, 2444555, 
С =4 B= 


азр, уа, taba a,b, 5+) 954+, Ыз, 


G is pre-multiplied by B, ог В is post-multiplied 
by G. The multiplication operation is not com- 
mutative, for in general 08 + BG. 

The transpose of the product of any number of 
matrices is equal to the product of their trans- 
poses taken in reverse order. E.g., (АВС) = 
(BA At 

The identity matrix 4 is the following square 
matrix: 


ipm ‚420 
025185: "en 
Цээл 2223 “| 28: (Uy, 


о 
© 
. 
[m 
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It is such that G = 34 = G. 

For а square matrix 0, with elements ap, let 
À - the determinant having the same elements, 
a;;, as matrix б, and let A,, be the cofactors 
of the elements a,;. Then the inverse of ( is 
0-2 and is given by 


1 1 
Es 8 4 = -iL Хул, R 
G we A X(adjoint of б) 


[14:06] 


Note that the element that is at the intersection 
of the ith row and jth column in the adjoint is 
not the cofactor of ар, but ора,,. A matrix 
and its inverse are such that 
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The multiplication of matrices is associative. 
E.g., (ВС = (В C) = (а B)C, the order of the 
letters being the same in all instances. 

The inverse of any product of matrices is the 
product of their inverses in reverse order. 

The multiplication ofmatrices is distributive. 
E.g. 


AUDIO SNO Le о 4:081 


А diagonal matrix has non-zero elements in the 
diagonal only. 1# these elements are the same 
throughout the matrix is a scalar matrix, it 
having the property that pre-multiplication by it 
and post-multiplication by it are the same, and 
further, if the common element in the diagonal 
terms of the scalar matrix Ņ 


TA - 
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OW ners 0 
ОКУ 

0 = и [14:09] 
О в 


is 6 then GQ = GG = Gg = 6G = a matrix having 
elements (ёа; |. 

In an equation of matrices a multiplying fac- 
tor may bemoved to the other member of the equa- 
tion by substituting its inverse, while main- 
taining its same relative position. E.g. 


080-1 48C= 9 
ВС = 0-20 а1зо a= 008 б) 
С = 8-10-20 07-207055872 
18С-4 [14:10] 
also Bc-a?9 
ВЕ 00 


Determinants: А matrix, including a square 
matrix, is just an arrangement of elements, but 
if the elements in the square matrix Q, equal to 
ЛҮҮ |, are evaluated as the deter- 
minant А, (also frequently designated ^) 


esci: ау, 
>: t азда 8 
А 7|a j| 1213222 "ots = P т 2 114:111 
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we have an ordinary algebraic polynomial. 

А determinant of К rows (and К columns) is one 
of the kth order. 

The k! terms in the polynomial A cover all 
possible combinations yielding terms having one 
element from each row and each column. The sign 
factor that attaches to each term can be deter- 
mined by ascertaining whether the number of in- 
versions in the first subscripts plus the number 
in the second subscripts is even or odd. If 
even the sign is positive, and if odd it is 
negative. E.g., consider the following term in 
a fifth order determinant: 8158554558448, 
The first subscripts, 1, 2, 3, 4, 5, show no 
inversions and the second subscripts 5, 3, 2, ], 
4, show seven (the 5 being to the left of four 
smaller numbers, the 3 to the left of two, and 
the 2 to the left of one). Accordingly, 74,5 
2,54558,18,, is one of the 5! terms in the ex- 
panded fifth order determinant. 

ау, is the leading element of the determi- 
nant А. 

411855 . . . a, is the leading term. 


The a,, to a,, is the principal diagonal, 
and the a,, to ау; is the secondary diagonal. 


Interchanging two rows, or two columns, changes 
the sign of the determinant. 

If the ith row and the jth column of А are 
crossed out, the remaining determinant is called 
a first minor and frequently designated Ay. А 
first minor of the type A,; is a principal first 
minor. 

A positive or negative sign attaches to the 
i, j position. It is given by (-1)'*). When 
this sign is attached to the minor we have a co- 
factor, which may be designated Ais. 


Aj SD eee [35:32] 
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A determinant may be expanded in terms of the 
elements of any row, or column, and their co- 
factors. E.g., in terms of the elements of the 
third row: 


A =а, ;4,;+ а, А ен В а, „А 


32 3k? 


[14:13] 


А =a, ^7 845A,5 +... + ПИ, 


The terms symmetric, positive definite, Gra- 
mian apply to determinants as well as to square 
matrices. 

A positive definite determinant is one in 
which all the principal minors > 0. 

А éramian determinant is a symmetric positive 
definite determinant. This isthe characteristic 
type of major determinant in multiple correlation 


and regression. 
The rank of a non-vanishing determinant is 


equal to the order of the determinant. 

The rank of a vanishing determinant is equal 
to the rank of its largest non-zero minor. The 
rank of a matrix is equal to the rank of the de- 
terminant of largest rank that can be made from 
its rows and columns. 

If the elements of any row, or column, of a 
determinant are multiplied by a constant, the 
determinant is multiplied by this constant. 

If the elements of any row (or column) multi- 
plied by a constant yield the elements of any 
other row (or column), then the determinant is 
equal to zero. 

If the addition of a,, to an element, а}, of 
the non-vanishing determinant A, yields a deter- 
minant which equals zero, then a; is a factorof 
А, and аг does not appear in the expanded and 


and reduced form of А. If azas a5,» * * * › “kg 


are k such factors, where k is the order of 4, 


А = а; „аль: * * уд (а sign factor). If the num- 
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Бег of inversions in the order a,b, ..., gis 
even, the sign is positive, and if odd it is 
negative. In particular should the factors be 
Q11»422» * + + Oy, then 


Акса а Pad Кау, [14:14] 


DETERMINANTAL SOLUTION OF SIMULTANEOUS EQUATIONS 


Given k simultaneous linear equations, so 
written that the constant terms are to the right 
of the 7 sign, thus: 


ат eater о нах о = а 
а уху + а, ох. +... + Sa X, = а, 7 а, 


; 5 г 4 22 [14:15] 


. . . . . 


Beaty карр +... + ах, “бо, = бүс 
Write down the major determinant 


епо E ne n 


ЕЯ. = азу eter а эр [14:16] 


а, а,...а 


in which the value of aj, is immaterial in con- 
nection with the solution for the X's, as it dis- 
aopears in all the equations [14:17]. The values 
of the x's are given by 


[14:22] 


POINT BINOMIAL 581 
-Аоз 
Ago 
“Ay, 
A 
ae [14:17] 
A 
Ayo 
DA -Aok 
Aoo ШТ 


THE POINT BINOMIAL 


The moments of (p*q)" about its mean are 
readily found by use of a reduction formula due 


to Romanovsky* (1923). 
Moments of a binomial series 


о эк Goo "ET MH ВАЗ [14:19] 


H3 


Romanovsky shows that 


We have 


З ИАЦ 


See [9:02] CIC OMS gp [14:21]. 
Sess [9:03] oon aoe tebe eee 


d 
дл =раС пз) 6 ое + + [14:18] 
d 


The repeated use of this gives Hos 20 Hys ete. 
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I 


м. = npg[1*3(n-2)pq]. . see [9:05] - . . [14:23] 


ш; = пра(-р) [1+ра(-12+10п)]. . . . . [14:24] 


ш = npc(l*5pg [-6*5n*pq( 24-26n*3n 2)] } [14:25] 
SECTION +. THE POISSON DISTRIBUTION 


The Poisson series is obtained from the Ыпо- 
mial by letting p, the probability of the event 
occurring, be indefinitely small, but п so large 
that pn is finite. Poisson’s exponential limit 
of the binomial (p*q)" (see Yule and Kendall 
1937, pp. 187-191) is a distribution such that 
the probability of a variate (x = 0, 1, 2...) 
taking the value of x is M*e^*/x!, or 


-M и? И Ро! sson 
е в +...) арын ол(14:261 


e^" is the relative frequency of X = 0; e-"M the 


relative frequency of X = 1; e-"M/?/2! the rela- 
tive frequency of X = 2; etc. The first four 
moments of this distribution about the point X-0 
are 


Uis LIMEN" PER TR NU + . [14:27] 
р. ... [14:28] 
BT WOCHEN. ovd sui [4:29] 
uq = ИСИЗ+6И?+1И+1)...... “0 -0-114:30) 


The moments from М are 


Moments of a Poisson series 


З к. 04:31] 
e ус Ханх [14:32] 


eee Я oon o Ec ое осо цр [14:33] 
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DE GE gu Еи 
fig = NCEFLON) У. ТОА) 
pg = И(1+25#+15#°).......ь ЛЗ 


The moments are determinable from a knowledge of 
the mean and without knowledge of p and n sepa- 
rately. Thus for the Poisson, И is a sufficient 
statistic. 

Molina (1942) has tabled individual terms and 
cumulated sums for И = 0 to .01, by intervals of 
.001; for И = .01 to 15 by intervals of .01; for 
У = 15 to 100 by intervals of 1. It seems proba- 
ble that allordinary needs can be served by this 
table. 

Pearson (1914) has tabled the successive terms 
of the Poisson distribution, tosix decimal places, 
for М from .1 to 15, by tenths. All the cumulants 
of a Poisson distribution areequal (and equal to 
M), a fact not true for any other distribution. 
If x, =x, +х,, in which x, is distributed in the 
Poisson manner with parameter ", and Х,, indepen- 
dent of х,, is distributed in the Poisson manner 
with parameter И’, thenx is distributed in the 
Poisson manner with parameter (J*!'). 


SECTION 5. THE HYPERGEOMETRIC SERIES 


This series can be explained by reference to 
an example. Let there be an urn containing n 
balls, of which gn are black and pn are white. 
Let № balls be drawn at a time. Let X equal the 
number of white balls in such a sample of N. 
This number will vary from sample to sample from 
0 to N, or to pn, whichever is the smaller. 

When N are drawn the probability that the 
first of the № is not-white is gn/n; that this 
having come to pass the second is not-white it 
is (qn-1)/(n-1); that, the preceding having come 
to pass, the third is not-white it is (qn-2) 
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(4-2); etc. to (qn-V+1)/(n-N+1), the probability 
that the preceding having been not-white the Vth 
is also not-white. The product of all these is 
the probability that when № are drawn none will 
be white. This gives the proportionate prob- 
ability given in the first rowof the accompanying 
table. The other rows are obtained by a very 
similar procedure. 


X=no. of  Proportionate frequency of 
white balls occurrence 


qn( qn-1)( qn-2) . . .( qn-N41) 


0 n(n-1)(n-2) ... (n-N+1) 

1 pn(gn)(gn-l) ... (qn-812) X ay 
п(п-1)(п-2) ... (п-№+1) 1 

Ё pn(pn-l)(qn) ... (gn-N43) y 

n(n-1)(n-2) ... (n-N41)  ? 

; [14:37] 

Х (pn)! (ап)! N! (n-N)! 
п! (pn-X)! (qn-N+X)! (N-X)! X! 

А pn(pn-l)(pn-2)...(pn-N41) 


п(п-1)(п-2) ...(п-№+1) 


This series of frequencies is а hypergeometric 
series. When nis infinitely large with reference 
to М it becomes a binomial series. 

Pearson (1924) has provided formulas for the 
first four moments as herewith: 
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Moments of a hypergeometric series 


Ч = pN аа, QE EC [14:38] 
D COM Me IE ; - + [14:39] 
а Т E MEE в № ОАО в аз DEA 201 
_ .Npa( nN) Е 
be p CUM aL e - [14:41] 
_ Npa(a-p)( n-N)(n-2N) | 
lis SCC d SE vise [14:42] 
_ Ке(п-М){п(п+1)-6М(п-\)+3рч [12(№-2 )-п\?+6М(п-М)] } 
p (n-1X n-2)(n-3) 
[14:43] 


Ап important statistic in connection with а 
probability function such as the binomial, the 
Poisson, or the hypergeometric distribution, is 
the sum of the frequencies up to a designated 
point. This matter is adequately covered by 
tables in connection with the Poisson distribu- 
tion. Camp (1924 and 1925) gives methods for 
obtaining this desired sum for both the binomial 
and hypergeometric distributions. They would 
seem to be adequate for most practical needs, 
though not of the highest precision nor of the 
simplicity of computation which one would desire. 

The obtaining of this sum for the still more 
useful normal, £ (Student's t), X°, Pearson Туре 
III, and variance ratio distributions is, in all 
cases, possible, as explained elsewhere in this 
text. 

SECTION 6. FACTORIALS AND THE GAMMA FUNCTION 

The gamma function of a number x*l is defined 


by 
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D(x*1) = [9 e`! ух dy The gamma function[14:44] 


It is such that 
reduction 


и Г formula [14:45] 


This holds for all values of x, integral, frac- 
tional, positive ornegative. The term factorial 
is commonly applied to positive integers, in 
which case 


XY sx (0558 01 Factorial reduction formula [14: 46] 


When X is a positive integer 


Г(х+1) = x! лол 2 - [14:47] 
It is frequently serviceable to generalize the 
concept factorial so that [14:46] and [14:47] 
hold when x is not a positive integer. When this 
is done we note that by repeated applications of 
[14:45] the factorial of any number can be ex- 
pressed as the product of certain numbers times 
the factorial of a number x’, wherein 02x’ 2 1. 
For example 


В ВЕЕ Kile 5X: 51 


Another example: 


(1.5)! _ (2.5)! 5! 


(-2.5)! = ae 
*h5 X15). (-15/(-55/.5 


Thus, except for the labor involved, a table of 
X!, for values from .00 to 1.00, or of Гх from 
1.00 to 2.00, is sufficient to yield, with орег- 
ations of multiplication and division, all fac- 
torials and all gamma functions. We provide 
herewith a brief table of this sort. 


114: 


1.00587 
1.00000 
«99433 
„9888 
«98355 
«97844 
„97350 
«96874 
«96415 
«95973 
«95546 
„95 135 


«94740 
«94359 
«93993 
„93542 
„93304 
„92980 
«32870 
„92373 
«92089 
. 91817 


„91558 
«91311 


„91075 
„90852 


.00002 


«00000 


GAMMA FUNCTION 


TABLE XIV A 
BASIC FACTORIALS AND GAMMA FUNCTIONS 


Г(х+1) 


-90640 
«90440 
„90250 
„90072 
«89904 
489747 
.89600 
„89464 
.89338 
„89222 
.89115 


89018 
«88931 
«88854 
„88785 
„88726 
„88676 
.88635 
. 88604 
«88581 
.88565 


88550 
.88553 
„88575 
.88595 


Го 


„88623 
«88659 
.88704 
„88757 
. 88818 
.88887 
„88964 
«89049 
.89142 
. 89 243 
„89352 


‚89468 
‚89592 
-89724 
«89864 
«90012 
„90167 
«90330 
«90500 
„90678 
.9086u 


«91 057 
«91 258 
.91467 
„91683 


1.00 
1.01 


587 


Г(х+1) 


„91906 
„92137 
„92376 
92623 
„92877 
«93138 
«93408 
„93685 
. 93969 
«94251 
«94561 


«94869 
«95 184 
«95507 
.95838 
„95 176 
„96523 
«95877 
„97240 
. 97610 
«97988 


«98374 
.98758 
«99171 
„99581 
1.00000 
1.00427 


„00001 


‚00000 
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The E, values recorded at the bottom of the 
columns give the maximal 2-point, or linear, 
interpolation error as determined for entries 
half way down the column. Similarly the maximal 
3-point, or quadric, interpolation error is 
given. 

For gammas and factorials of small numbers 
the reduction formulas [14:45] and [14:46] may 
first be employed and then the necessary values 
found in Table XIV A. For large numbers, x > 4, 
Stirling's formula 


An(x!) = 12) Án x.- x +> dn (27) + m 


(14:48] 
NES * - сал СЭН 
360x? 1260х? 1680x!  " 


and Forsyth's (mme TE 
+ х + х? 
6 х х 


Г (х41) = Er ---- 


EH 


(14:49] 


are highly reliable. If n is large the error in 
Forsyth's formula is less than 1/(240x?) of the 
whole. (See Pearson 1895 and 1901). Even for 
X - 1.5 the error is only in the neighborhood of 
l per cent. (See Kelley 1923 p. 136). 

A function frequently needed in curve fitting 
is [(x+l)/x*e"*. The logarithm of this to the 
base 10 as given by Pearson (1934) is 

х+ 
log е .3990899 Е х 
х*е-* 2 [14:50] 


25262 
Шри ОЗ 
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The incomplete ватта function: If the upper 
limit of the integral [14:44] is changed from o 
to y, we have an "incomplete gamma function, " 
which has been tabled by Pearson. This is impor- 
tant in that this is the probability integral of 
a Pearson Type III distribution,—the equation 
being written with the origin at the mode. The 
Type III distribution has been found to be an 
excellent approximation to much biological and 
physical phenomena. One illustration is the dis- 
tribution of yearly maximum flow of water in a 
river. 


SECTION 7. THE NUMERICAL SOLUTION OF HIGHER DEGREE 
PARABOLIC AND OF EXPONENTIAL EQUATIONS 


So many procedures have been employed that the 
student may expect to find in the mathematical 
literature (see Scarborough, 1930, Ch. IX and 
Whittaker and Robinson, 1924, Ch. VI) a method 
adapted to his particular problem. Only two are 
given here, the first being serviceable in the 
case of a rapidly convergent power series and the 
second when the function can be expressed as the 
sum or the product of two functions, each suf- 
ficiently simple to permit of rapid plotting. 

If 


у=а+5х + сх? + dx? & ex* + negligible terms [14:51] 


Weletx,be a first estimate of the correct value. 
Substitute x, in all terms of [14:51] beyond the 
X term and solve for x, calling the value ob- 
tained x,, a second estimate. Substitute x, in 
terms beyond thex term, solve, andcall the value 
obtained x,, a third estimate. Continue the 
iterative process until two successive values, 
say x, and x,, differ by a small, but not quite 
negligible amount. Compute y, and y,: 


y, = а + bx, + сх? + dx} + ежу... + [14:52] 
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а а Чск БЕ Ecce. . [14:53] 


Then an interpolated value, х, fromthe following 
table in which all values except x are known, is 
the required answer. 


у х, 
y X x 
У Кү 


Generally, if у is not intermediate between Уу 
апа y,, further iterations are desirable before 
interpolating to obtain the final value of x. 

If f(x) = 0 is to be solved for x, it will 
frequently be possible to write 


f(x) Фф(х)-1(5) 50. .... [14:54] 


in which ф(х) and U(x) are not difficult to plot. 
The intersection, or intersections, of the two 
curves give values of x for which f(x) =0. After 
a point of intersection has been approximately 
located from the graphs, select two values of Xi 
and X, ароп opposite sides of this point and as 
close to itas possible and compute Ф(х,), Ф(х,), 
(x,), and U(x,). We desire a value x, between 
X, and x, such that ф(х) = U(x). If the points 
х and X, are sufficiently close, the curves in 
this region may be represented by two intersecting 
straight lines, and from similar triangles it is 
readily derived that 4 


__ 079 х, ) - Wy) 


eee ; 
-фх,) +ф(х,) ф(х) x) +x, [14:55] 
А substitution of x into (x) and U(x) will dis- 
close the accuracy of the solution. 
If f(x) сай be written as 


f(x) = Ф(х) (х) . . . . [14:56] 
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we can employ logarithms, defining two new func- 
tions 


Р(х) = log ф(х) ~ log f(x) (14: 57] 
Q(x) =-log U(x) 
which are related as in [14:54]. 
P(x) - Q(x) 50 


Solving as before we obtain x which satisfies 


[14:56]. 


THE NUMERICAL SOLUTION 
OF COMPLICATED SIMULTANEOUS EQUATIONS 


The numerical solution of simultaneous equa- 
tions which cannot readily be written in the form 
y * f(x), can frequently be rapidly accomplished 
by iterative processes, If two equations are 
functions of x and y, they can generally be 
written in the following form in a number of ways: 


х 8 (хуу) decns . 14:58] 
y = РУ) доить ев [14:59] 


An iterative process upon these equations may, or 
may not, converge in the neighborhood of some so- 
lution, Х,Уу. Graphs of [14:58] and [14:59] re- 
veal, let us say, that x = x, and y = y, is an 
approximate solution. А second approximation is 
х,,Уз, given by 


ке = и [14:60] 

ya SOE лин [14:61] 
А third approximation is Х,,Уз» 

ж, = ЖЫ гк м [14:62] 

уу =F x, у «+++ [14:63] 


This process is to be continued until two suc- 
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cessive sets of values differ negligibly. If 
convergence is not present, a different segrega- 
tion of terms may be made so that we have x = 
(x,y) and y = W(x,y) in lieu of [14:58] and 
[14:59]. Upon successive iteration these may lead 
to convergence. For iteration in the case of two 
such functions, say [14:58] and [14:59], to Бе 
convergent it is necessary that 


af OF 

— хээр ox AGER +s 164 

ЭХ 3: 1 [14:64] 
апа 

с -2 <1 [14:65] 

m ay io NE :65 


in the neighborhood of x,y,. This test for con- 
vergence can be made prior to iteration, or 
omitted if one prefers to discover the outcome by 
trial. 


SECTION в, TRANSFORMING RANK AND PERCENTAGE POSITIONS 
INTO QUANTITATIVE SCORES 


Normalizing a distribution: In many situa- 
tions in which a rank order, a percentage, or 
proportion position is given it is desired to 
treat the data quantitatively. In general, if 
р\, Р,» and p, are three proportions, the numer- 
ical values of the differences (ру=р„) and(p,-0.) 
are an inaccurate quantitative statement of the 
underlying variables which yielded the three pro- 
portions, For example, if in a given trait, 
individual 4 exceeds p, of the members of his 
group, individual В exceeds p, and individual C 
exceeds p,, and if the distribution of talent in 
the group is normal, the correct relationship is 
not (p,-p,) to (Р,-р,), but (x,-x,) to (x,-x,) 
in which the x's are the deviates in a normal 
distribution corresponding to the p's. ТЕ, for 
each p, an X is abstracted from a tableofa unit 
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normal distribution, the variable thus obtained 
will have many properties which fit it for quan- 
titative study. This process is called norm- 
alizing a distribution. There obviously should 
be some plausibility in the assumption of near 
normality of the underlying trait before it is 
done. Even so, the x-variables resulting do not 
have all the characteristics of the quantitative 
measure in the case of a normally distributed 
trait. 

If a test 15 equally reliable throughout а 
certain range, scores received by two individuals 
taking the test, whether low, intermediate, or 
high, have the same standard error which approp- 
riately is called the standard error of measure- 
ment. In general, two proportions, р, and р», 
have unequal sampling errors, for, as given by 


[13:107], these standard errors are ¥p,q,/N and 
Ур,а,/\ respectively. Also the equivalent x, and 
X, deviates have unequal sampling errors, for as 


derived from [4:03], these standard errors are 
Vp,q,/Nz? and Ур,а,/Ү23 respectively. Though 


the normalizing of ranks, or a series of propor- 
tions, or of percentages is known frequently to 
be very useful, nevertheless a transformation of 
such ranks, proportions, or percentages into 
scores which are equally reliable will clearly 
have value in cases where a single concept of the 
reliability ofa score due to sampling is crucial 
in interpretation. 

Transforminé rank or регсепЁабе position into 
equally reliable deviation scores: Let us con- 
sider a transformation of p into X, such that c, 
(the standard error of each x, not the standard 
deviation of the distribution of the x's) is 


independent of the value of р, — that is, c, = 


Yc/N, in which c is a constant and N the size o: 
the sample yielding p. (An illustration follows 
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in which М is the number of items comprising a 
test.) The variable p is the observed value 
whose standard error is known, namely /pg/N, and 
the standard error which attaches to х is to be 
derived from that which attaches to Do Let 


XX fi аа MM E e [14:66] 


dx = f'(p)dp, aec [14:67] 
and immediately 


c : 
= Р (010, = ftp) RE AES + +» [14:68] . 


or 2 
f'(p) =—— РА О. [14:69] 
р-р? 


Integrating 


х = (2 sinl vp ~~) C ЫС a 14:10] 


If К, the constant of integration, be set equal 
to 0, and c set equal to l, we have 


X =2 sin vp T БУРЭЭР ЭХ, ‚ 14:71] 


Then x takes values from —п/2 to 7/2 as p takes 
values from 0 to l, and o,, no matter the value 
Of X, is a function of the size of the sample 
only and is equal to 1//N, This ©, is the stan- 
dard error of each x and not the standard de- 
viation of the distribution of X's, which takes 
а range of values depending upon the distribution 
of the p's. 

In certain problems it is desirable to make 
an analysis of the variance of a Set of inde- 
pendent proportions, but the proportions, based 
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upon the same numbers of cases, are different 
and accordingly unequally reliable. If each 
proportion is transformed into an x by [14:71], 
the resulting measures are independent and e- 
qually reliable so that their further study by 
analysis of variance methods is simple and ac- 
curate. 

As an illustration of a problem which might 
well use this transformation, consider a funda- 
mental operations in arithmetic test, consisting 
of N problems, of substantially equal difficulty, 
administered without time limit and scored for 
speed of completion and again for accuracy of 
computation. The accuracy score is p, the pro- 
portion of answers that are correct and as ex- 
plained it has unequal reliability from subject 
to subject. If for each p we employ [14:71] or 
Table XIV B to obtain x, wenow have a score with 
a constant standard error l/VN. This happy out- 
come has been accomplished at a certain expense. 
The distribution of the x's is presumably not a 
correct representation of the true distribution 
of underlying arithmetical computation ability. 
Specifically, had this true distribution been 
normal neither the distribution of p's nor that 
of the x's would be normal. Ном serious this is 
becomes an appropriate topic for investigation 
in each particular instance. 

The [14:71] transformation may be useful when 
the original data consists of ranks, or equiva- 
lent percentages, or proportions. If N subjects 
are ranked in an order of merit, the proportion 
of cases falling below a person is his p-score. 
In computing this the proportion represented Бу 
the person himself is split into halves, one-half 
being added to the proportion below him and one- 
half со the proportion above him; thus the person 
ranking lowest has a p-score=.5/N; the second 
lowest a p=1.5/N; the third lowest a p = 2.5/N; 
etc. These p-scores form a rectangular distri- 
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TABLE XIV B 
ОБОГО Nd 


vja 


(negative (negat ive (negative 
огр < .5 forp<.5 forp<.5 
and positive and positive and positive 
for p > .5) for p> .5) for p > .5) 


bution which we may treat as continuous if N is 
at all substantial, Whereas in the arithmetic 


P = 1. We now know the form of the distribution 
of P and if the [14:71] transformation is made 
we can determine the form of distribution of x. 
The equation of the distribution of P, So written 
that the total area is Tis 


Zz, ^1 (from p=0 to р=]) [14:72] 
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Differentiating [14:71] 


hy o АЗ 


Let the number of cases in the interval dx be Жэ 
and the number in the interval dp be 2 


x 


HO SES 


The ordinate in the X-distribution [14: 14] 


23 
P : 
2, =—— = ] rhe ordinate in the p-distribution[ 14: 75] 
Р dp 


The element of area z,dx equals the corresponding 
element of area z,dp, that is z,dx = dp. Utili- 
zing [14:73] we have 


О sc ISP TASTE 


x 


Obtaining p from [14:71] and substituting we get 


x x т 
ВЕ sint 287) cosi er aS 
The sine-cosine distribution E< x $% [14:71] 


This is a symmetrical distribution, with limits 
in x of -7/2 and 7/2, of total area 1, having a 
mean of zero, a variance of (7/-8)/4, a Pearson 


Ву of 0, a Pearson В, of 2.19375 


t T^ - 480? +384, 


n* - 167°? + 64 
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an area between any two values as given by 
[14:78], and a standard error of every x, if de- 
rived from a p with standard error of Ура/\, 
(which, presumably will seldom be the case), of 
l/VN. The area between x, and X, is given by 


ji 2 


3! 


ayes E or T 
= 1 к = si — +— 
3 2, dx = [sin © 1, sin © 2) 
Я 
= вїп(—® +—) 
2 И 4:78] 
2= 


If a table of sines for circular arguments is 
not available, x,may be expressed in terms of 
degrees employing the relationship 


5 80 
X radians = 180 degrees, ог 


57. 29577 951х = corresponding number 
of degrees [14: 79] 


.01745 32925 (degrees) - correspond- 
ing number of radians [14:80] 


SECTION 9. ТНЕ SQUARE ROOT TRANSFORMATION 
(See Bartlett 1935) 


In the situation wherein independent variables 
are normally distributed а sum (or weighted sum 
by introducing weights), as given by [14:81], 
(see Chapter VIII, Section 10) 

ХХ, «X, +... Е... 14.81 
is distributed normally, and such an estimate as 
X/k is of the mean of the right hand member X's, 


- 


—— ьа 
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isanefficient estimate, if these Y's are equal- 
ly reliable. The analysis of variance proceeds 
via the equation, 

Ven VOD ИЕ у, - 

If the independent X’s are Poisson distributed 

a more efficient prediction equation than one 
predicting Х/К is one predicting VX/k, thus, if 
р Е S p oa ngu 

Ох - 

V(X) КГУ 4) S 10 53 V(Y,)04:82] 


Though this is not a totally adequate transfor- 
mation* (See Cochran 1940) it has such merit as 
to suggest its use when the original variables 
are Poisson distributed and an analysis of vari- 
ance is desired. 


SECTION 10. CUBE ROOT TRANSFORMAT ION 


The Wilson-Hilferty transformation is a line- 
ar function of a cube root. It very nearly nor- 
malizes the A^ distribution. As employed in 
Chapter IX, Section 5 it is the basis for very 
nearly normalizing variance ratio distributions. 
As these distributions are to be found in all 
realms of social, biological, andphysical sta- 
tistics the cube root transformation may be ex- 
pected to have wide utility. 

SECTION 12. CERTAIN PROPERTIES OF DIFFERENCES IN 
TABLED ENTRIES 

We employ the notation of Chapter XIII, Sec- 
tion 12. 

If in a table all t’s except ty are correct 
and f, is too large by the amount €, the errors 
in the differences take the pattern: 

* Cochran refines his УХ values, by an adjustment which is 

small except in the case of an X that equals 0. 
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It will be noticed that the largest error in the 
even A's is opposite the tabled entry which is 
in error. When an error in some f value is sus- 
Pected it is frequently possible to locate it by 
thus studying the differences. 

The following formula, noted by Cosens (1944) 
is particularly useful to check an entry, tos 
which one questions, by determining what this 
entry would be based on the six neighboring en- 
tries. 


ts = ТСЕ ему у. .05(t_,+t,) 


во: - . . [14:83] 


ШАРР 
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When the recorded f, is presumably in error the 
value NEER will, presumably, be greatly in er- 
ror. If neighboring portions of the table war- 
rant the belief that A'I is negligibly small, 
the ARE term in [14:83] may be dropped. 


SECTION 12. EXPANDING A TABLE BY INTERPOLATING VALUES 


Let us be given a table with entries t for 
equally spaced arguments a and let it be desired 
to interpolate (п-1) additional values between а 
t value and a neighboring t value. For example, 
if the arguments run .00, .01, .02, . . . with 
corresponding tabled entries, it might be desired 
to expand the table so that the arguments run 
.000, .001, .002, . . . with corresponding tabled 
entries. The interval in the expanded table is 
1/10 that in the original table. In this case 
п = 10 and 9 additional values are to be interpo- 
lated between existing values. The tabled en- 
tries have, of necessity, an error which may be 
as large as 1/2 in the last decimal place re- 
corded, so these tabled entries should be accur- 
ate to one or more, preferably more, decimal 
places farther than is required in the interpola- 
ted values. 

Differences in the original table are computed 
to the first order that becomes negligibly small. 
We will assume this to be the sixth order, so 
that A'! = 0, and illustrate the solution for 
this case. If A's represent the differences in 
the original table and ô’s in the expanded table, 
the relationships between them are given here- 
with. For simplicity the subscript 0, or any 
other subscript which remains constant, which 
attaches to every 9 and Л, has been omitted. 

ут : : 
Е oA Bey eee ЧН 
пб 
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X 
NET. . [14:85] 
5 
п 
Ir х 
и... иас 
п* п? 


pm A _3(п-1) АТЕ (a-1)(-5) дт 


п? 6" 4 п? 


[14:87] 


Ir ama pue D Іг 
Еа ша + fe Uln-7) AF 


п? 12 n* 


(n-1)(2n-1)(5n-3) ДЕ 
co CSSA а 


emcee [14:88] 
12 n 


51-25 (0-1) МЕ (п-1)(2п-1) SET 
oS =f) AP (а-1)(2а-1) ОТ 


n DUET 3! п? 


_ (п-1)(2п-1)(3п-1) AŒ 
А д 


4! пе 


+ (1-0 (21-1 (31-1) (41-1) АХ 
AET n I) (a) AC 


.. [14:89] 
5! п? 

Having the 8’s, the interpolated values are 

readily obtained. A ready check for accuracy is 

available by employing overlapping regions and 
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computing certain of the interpolated values 
twice. The preceding formulas serve by dropping 
off the higher order terms if some ^ of lower 
order than A'' approximates zero. If it is a 
higher order than A‘! that approximates zero, 
similar formulas to the preceding can be derived. 


SECTION 13. TRIGONOMETRIC FUNCTIONS OF SUMS AND DIFFERENCES 


Given sin, cos, and tan of 6, 


9 [1 8 [1 
in—=/—(]- 8): 2p 8): 
этно 5\1 cos 5 525) 24 cos J; 


ө ѕіпб 
Сап — = —— 14:90 
"3 1+cos@ У 


sin(20) = 25118 созӣ; сов(20) = cos?^0 - sin?6[14:91] 


2 талд 


садои) a 
95200 1 - tan?8 


[14:92] 


Given sin, cos, and tan of Ó and of Фф, 
511(0+$) = sinÓ cosp t соѕ0 sind . . . .[14:93] 
cos(0i$) = соѕ0 cosó x sinO sind . . . .[14:94] 


0 + 
сап(0+ф) = IER ee UE 0 poe АЕ] 
1 + tan сапф 


SECTION 14. SPACE OF TWO DIMENSIONS 


А few properties of a straight line. There 
are various forms for the equation. 


ах + by +c = 0 The general form [14:96] 


у = ат Вх In which a is the intercept [14:97] 


on the axis and b the slope 
of the line. 
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x In which à and D are the 
t—-7] intercepts on the У and [14:98] 
b X axes respectively 


M Ут хоо х, In which (xy yy) and 
EXW UU 2292) are two points [14:99] 
pct Wy X57 X, through which the line 

passes. 


The perpendicular distance of the point (х,,у,) 
from the line [14:96] is 


ах Е by, +e In which the radical 
d=. takes the sign ор- (14: 100] 


posite to that of С. 
Vege ББ? 


Rotation of axes in two dimensional space: 
Let x, and x, be the original variables and Vy 
and y, the variables after rotation. If the 
axes rotate counter clock-wise through the angle 
9, the transforming equations are 


Yi = Xi cos 9+ X, sin 0 + + + «+ [14:101] 
Ур Я: зїп Ө + X, cos 9 


14:102] 


The reverse equations, expressing the same re- 
lationship, are 


А а. ПТА: тота] 


х 
и 


207 У: зп ВУ, соз QUE . . [14:102a] 
Correlation functions of rotated variables: 
Let хү, Ху, and x, be three correlated variables 
with means of zero, variances of Vir У, and V}, 
and covariances с. ,, Суу» and с,.. Let x, re- 
main unchanged, but x, and X, transformed into 
Y, and y, by the rotation given by [14:101] and 
. [14:102]. Then 


Ё 
4 
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ИУ») = Уусов 0 + V,sin*0 +2 с, sinô совб [14. 193] 
Ку») = Vysin'Ó *V,cos*0- с зіп созд [14.104] 
КИ ec 112-1051 


с(узу,) = с, >(соз?0 - sin?0) 
= (V,-V,) sin@ cosÓ- - - + . [14:106] 


с(у уху) = €,5cosÓ + c, sinô. +... .[14: 107] 
с(у,х,) = 7€34sinÓ + C,,c0s0, AXES e [14: 108] 
(с(уух,)15 + [e(y,x,)]* = c], + c2, — [14:109] 


If the rotation in the x,, x, plane is to 
major and minor axes, or such as to make V(yi) a 
maximum, it of necessity at the same time makes 
V(y,) a minimum and the covariance, с(у,у,), 
zero. In this case the angle 0 is given by 


201; 


tan(20) = aM AXE E [14:110] 


Tables to facilitate such rotations, giving 
sinô, cosÓ, 81020, cos?0, sin со59, and 2sin 9 
соз@ for argument @ or argument tan(20) are 
given in Kelley (1935). 


SECTION 15. SPACE OF THREE OR MORE DIMENSIONS 


The general form of the equation of a plane 
in three-dimensional space is 


ах ЧЕ Бу EFOR ae ON ИЛЕ [14:111] 


The intercept form in which a, b, and c are the 
intercepts on the x, y, and x axes, is 
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ЕЯ IN ww. :114:1121 
ao ЧОН 


The distance of the plane [14:111] from the point 
(X3,Y3,23) is 


ax, + by, + cz; +d 


d = Е 114:113) 
У a? * b? + c? 


in which the sign of the radical is opposite to 
that of а. 

Two non-parallel planes intersect ina straight 
line. Accordingly the equations of two planes 
define a line. 

Three planes, in general, intersect in a point 
and the equations of three planes define a point. 

For the angle between two lines, we let the 
lines be defined by 


NECI ett [14:114] 


and 


a, b DN x е [14 : 115] 
then 


2 12 bib, + cies 
Соб. 2 [14:116] 


Ж 
Va? 382 Ест Va;tb? ice D 
in which 8 is the angle between the two lines. 
. The sine of the angle between the straight 
line [14:114] and the plane [14:111] is 
ауа + b,b + cc 
sin 6 = 


* o. -[14:117] 
Val БЕ re: уа +b? +c’ 
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SECTION 16. LAGRANGE MULTIPLIERS 


These multipliers provide amethod for finding 
the maximum or minimum of a function when one or 
more conditions are imposed upon the variables 
in the function. The general statement is as 
follows: 

Let f be the function to be maximized, or 
minimized. Let ¢,=0; ... Ф,-0: be the condi- 
tions imposed upon the variables. f, 4, Po» 
* «5 › Фү are functions of Хү, Xz, e . . , х,» 
Write 

0-55430,93,ф, 5... A4, [14:118] 


Obtain the n partial derivates and set equal to 
zero. 


ОХЕ ЕТА] 


These n equations, together with the k equations, 
1505. оО Ф,-0 provide a set of (п +) 
equations from which the A's may be eliminated 
and the maximizing, or minimizing, values of the 
X's obtained. 1 

To illustrate: Let f erra 7 the area of a 


triangle with base x and altitude У, and let it 
be further imposed that x * y?- 9, Required to 
find the dimensions yielding a triangle with max- 
imum area. We have 
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(хит a a .. [с] 


2 
ар or [d] 
— ш-- pa). 5 . . . 
х ond = 
эй _1 T 
x ime х+ 2, jac dis Ov Кз». а. [e] 


Solving [b], [4], and [e] simultaneously we ob- 
tain X = 6 and y = УЗ for the required dimen- 
sions of the triangle. 

SECTION 17. GROWTH CURVES 


а, b, and € are constants and x is time, or 
age. The law of organic growth 


VEU аре c c E ss 6 + [14:120] 
Makeham's force of mortality curve 
сые сс [14.121] 


The simple logistic growth curve 


a 
Vu EM RESI T. [14:122] 
bes е» 
The Gomperz growth curve 
x: 
yu Е (с. [14: 123] 
А growth senescence curve 
a 
У стест ee [14: 124] 


РЇр-1)(р-2)к° 2B, + REVO p-3)(p-4)&*?B, _ 
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SECTION 18. THE BINOMIAL THEOREM 
(a+b) = a^ + na^^!b SUD nage 


n(n-1)(n-2) 
Е 


Л nS pu te 


2 n(n-1)(n-2) ... (п-п+1) 
ИН ы ших 


п! 


п-прп [14:125! 


The expansion of the polynomial (а+5+с+. .. 
*k)" consists of terms the sum of whose exponents 
is always equal to n. If for some term the ex- 
ponent ofa is a, of b, B, of c, у, . . . of k, к, 
where of course а t D * y t. . . + « =n, this 
term is given by 

n! 
— аб... KK.. ‚ [14:126] 
АПТ Уи Кт 


SECTION 19. STIRLING'S APPROXIMATION TO THE FACTORIAL 


Ча (М!) = (N = 4а(8) +> 440") -N 


Bi В ОВ: | 
Usu am gx [14:127] 


The B's are Bernoullian numbers. 


SECTION 20. THE SUM OF THE POSITIVE POWERS OF THE 
FIRST k NUMBERS 


KU KES РЕВ 
1? + 2+... + kP= +— +—___— 
piu 21 


[14: 128] 


4! 6! 
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the series terminating with the term in which the 
exponent of kisl or 2. The B's are Pernoullian 
numbers. 


SECTION 21. FOURIER SERIES 
у=а + a,sin х + Бүсов х + a,sin 2x 
OCONI EE [14:129] 


This is important in fitting curves to data 
showing periodicity. This is treated in E. T. 
Whittaker апа С. Robinson, THE CALCULUS OF O?SER- 
VATIONS, (1924) Chapter X. 


SECTION 22. TAYLOR'S THEOREM 


f(x) may be expanded in the neighborhood of 
X = a thus 


23ү2 
Кх) 5 Ка) » (х-а) ' (а) + 0 f''" (ay) ЛД 
(хта)"-1 n-1 
SEIL IQ [14:130] 


in which R is the remainder after n terms, and 
the successive f-primes are the successive der- 
ivates of the function evaluated for x - 0. If 
the limit of? is zero asn ~ © we have а Taylor's 
series, and this is a cmvergent series. [f а=0 
in a Taylor's series of one variable we have a 
Maclaurin series. Taylor's theorem also applies, 
with certain necessary modifications, to func- 
tions of more than one variable. 


SECTION 23. THE EULER-MACLAURIN FORMULA FOR 
EVALUATING A DEFINITE INTEGRAL 


Let k be the number of intervals of uniform 
base in the region from а to а + ki. Then the 
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integral of f(x) dx is given by 


atki 1 


+ [ водах = Ка) + Kati) 


+ Ка+21) +... + Қант і ) 


= f(atki) -A [f'(atki)-f'(a)] 


PB, 
4! 
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in which the B's are Bernoullian numbers. 


СНАРТЕВ ХУ 
STATISTICAL TABLES 


SECTION 1. SELECTED REFERENCES 


The tables given in later sections are likely 
to be too abbreviated for refined statistical 
work. They should prove adequate for training 
Purposes and for preliminary work. Much more 
extensive tables of the sorts here given are 
available in The Kelley Statistical Tables, in 
press, 1947. 

The accompanying very limited and select ref- 
erences indicate the sources of what are deemed 
to comprise certain tables important to the sta- 
tistician, The arrangement is chronological, 
except where several tables appear under a single 
institutional authorship. 

1814. BARLOW'S TABLES. 1814 and‘ later. Edi- 
tion of 1930 here referred to. (Generally 8 
significant figures in tabled values, with an 
argument of 4 figures] "Squares, cubes, square 
Toots, cube roots, andreciprocals of all integer 
numbers up to 10,000." The square root of Y and 
of 10\ are given. The cube root of Y is given, 
but not that of 10У nor that of 100%. A table 
of powers up to the tenth of all integers from 1 


612 
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to 100, and of powers up to the 20th of all in- 
tegers from l to 10, is given. 

1910. B. 0. Peirce. A SHORT TABLE OF INTE- 
GRALS. Second edition, 1910,- 144 pages. This 
1s a handy and concise source for 938 formulas: 
integrals, auxiliary trigonometric, hyperbolic, 
elliptic functions, Bessel's functions, series, 
products, derivatives, etc. Also contains brief 
tables of the probability integral, elliptic in- 
tegrals, hyperbolic functions, natural and com- 
mon logarithms, and trigonometric functions. 
Third edition revised by W. F. Osgood, 1929,-156 
pages. 

1914. Kar! Pearson, editor. TABLES FOR STA- 
TISTICIANS AND BIOMETRICIANS, Part |. (Зее also 
Part II, 1931; Cambridge University Press, Tracts 
for Computers, 1919-37; and, Biometrika Publica- 
tions, 1934-38). 

Table II (W. F. Sneppard) [7 place consequent] 
Area and ordinate of the normal curve in terms 
of the abscissa. Interval is ,01 c. (1/2 (lta) 
= p and 2 = z of Table XV C herein.) 

Table XII (9. Palin Flderton) [6 places] 
Tables for testing goodness of fit. These are 
12 tables. The interval in Д? is very coarse, 
being 1. It is important to note that п’ as 
usedinthese tables is one greater than d. o. f. 
The п’ arguments are 3(1) 30. 

Table XXVI (W. Palin Elderton) Tables to 
assist the calculation or the ordinates of [a 
type III] curve. 

Table XXVII (Е. Palin Elderton) Tables of 
powers of natural numbers 1 to 100. Also Table 
XXVIII Sums of powers of natural numbers 1 to 
100. 

Table XXX (P. F. Everitt) [4 place] Tables to 
facilitate the determination of tetrachoric cor- 
relation. г -.80, 85, 90, 95, and 1.00. For 
more complete tabulations see Pearson's Tables, 


Pt. II, Tables VIII and IX. 
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Table XXXI (J. Н. Duffell) [7 places] Loga- 
rithms of the gamma function . . . from p = ] to 
р = 2 [interval .01]. 

XXXV A Diagram to determine the type of a 
frequency distribution from a knowledge of the 
constants 2, and £,. - 

Tables XXYVII to XLVII (A. J. Rhind) Probable 
errors of frequency constants connected with the 
Pearson system of curves. 

Table XLIX (Julia 3el1) [Beginning with 7 
place and ending with 11 place] Logarithm of 
factorial n from n = | to 1000. 

Tables LI and LII concern Poisson distribu- 
tions. 

Table LIV (Alice Lee) Tables of the G(r,v) 
intesrals. G(r,v) = віп" e"? 46: Auxiliary 
functions leading to this integral are tabled 
for arguments r=] (1) 50 and Ф = 1 (1) 45,—Ф 
being defined by tan Ф = vr. 

1919-1939. Cambridge University Press Tracts 
for Computers, Karl Pearson, Editor. Department 
of Applied Statistics, University of London. 

No. 1, 1919. (Eleanor Pairman) Tables of the 
Diéamma and Triéamma Functions. The digamma 
function is defined Е (2) = d/dz 4n Г (1+2), and 
the trigamma function is defined 


d? 
Е (2) — Ха Г (1+2) 
dz? 


Р and F are tabled for argument x = .00 (.02) 
20.00. 4 

№. 4, 1921. (Originally computed by А. M. 
Legendre) Tables of the Logarithms of the Com- 
plete Gamma Function to Twelve Figures. Log Г 
а is tabled, Argument а = 1.000 (.001) 2.000. 

№. 8, 1922. (Евоп $. Pearson) Table of the 
Logarithms of the Complete Gamma Function (for 
arguments 2 to 1200, i.e., beyond Legendre's 
range). Log! (p) is tabled to 10 decimal 
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places. Argument p-2.0 (.1) 5.0 (.2) 70. (1) 1200. 
Yo. 9, 1923. (John Brownlee) Losl(x) from x=] 
to 50.9 by intervals of .01. [7 decimal places]. 
No. 13, 1926. (James Henderson) Bibliotheca 
Tabularum Mathematicarum, being a descriptive 
catalog of mathematical tables. Part I, Loga- 
rithmic Tables. 

Alexander John Thompson. Logarithmetica 8г1- 
tannica, being a standard table of logarithms to 
twenty decimal places. No. 11, numbers 90,000 
to 100,000 [1924]; no. 14, numbers 80,000 to 
90,000 [1927]; no. 16, numbers 40,000 to 50,000 
[1928]; no. 17, numbers 50,000 to 60,000 [1931]; 
no. 18, numbers 60,000 to 70,000 [1923]; no. 19, 
numbers 10,000 to 20,000 [1934]; no. 20, numbers 
10,000 to 80,000 [1935]; no. 21, numbers 30,000 
to 40,000 [1937]. The part for numbers 20,000 to 
30,000 announced in 1945 as "at press." 

No. 15, 1927. (L. H. C. Tippett) Random Sam- 
pliné Numbers. 10,400 four-figure numbers. 

No. 23, 1938. (L. J. Comrie) Tables of tan`? 
x and log (1*x?). 

No. 24, 1939. (M. 3. Kendall and 8. Babington 
Smith) Random Sampling Numbers, —Second Series, 
100,000 digits grouped in twos and fours and in 
100 separate thousands. 

1923. James W. Glover. TABLES OF APPLIED MATHE- 
MATICS IN FINANCE, INSURANCE, STATISTICS. Part I 
"Values of compound interest functions to 8 places 
of decimals and 7 place logarithms of values of 
compound interest functions." Part II "Values of 
life insurance and disability insurance functions." 
Part III "Values of probability and statistical 
functions." Part IV "Common logarithms of numbers 
from 1 to 100,000 to 7 places of decimals." 

1930. L. В. Salvosa. Tables of Pearson type 
III functions and Derivatives of Pearson type 
III curve. Annals of Math. Stat., Vol. 1, 1930 
[six decimal places]. 

Table I. Areas. Argument, t, is a standard 
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Score deviate. t = -4.99 (.01) 9.99. Skewness 
argument, Изот. 

Table Fie Ordinates. Argument f = -5,49 
(.01) 9.99. Skewness Е) sl. 

Table III. First six derivatives. Argument 
t - -9.9 (.1) 14.9. Skewness a, = .0 (.1) 1.1. 

1931. Karl Pearson, Ed. TABLES FOR STATIS- 
TICIANS AND BIOMETRICIANS, Part II. 

Table II (T. Kondo and Е. 4. Elderton) Abscis- 
sae, ordinates and ratios [z/p, z/q, p/z, 6/2] 
to ten decimal places of the normal curve to each 
permille of frequency. 

Table III (John P. Mills and 3. Ч. Camp) [5 
Places] Patio, areaoftail to bounding ordinate, 
or [g/x] for eacn percent. of deviate. 

Tables V-VII (Alice Lee) concern tetrachoric 
functions. 

Tables VIII-IX (Alice Lee, Margaret loul, 
Ethel M. Elderton, A. Е. P, Church, E. C. Fieller, 
J. Pretorius, and K, Pearson) [6 place] "Гог 
determining the volume of any quadrant or of any 
cell of a bivariate normal frequency distribu- 
tion." The interval in both arguments is .lo. 
A table for each of the following values of r is 
given: т = -.95 (.05) 1.00. 

Tahles XXI-XXII (L. Н. С. Tippett) [7 and 6 
place] "Distribution of extreme individuals and 
of the range in samples from a normal population." 
Table XXI gives distributions for Y = ox epe 
20, 30, 50, 100, (100) 1000. Table XXII gives 
mean range for V - ] (1) 1000. : 

Table XXV (K. Pearson and B. Stoessiéer) [7 
place] Probability integral for symmetrical 
curves; B, =0; В, = 1 to 3 and 3 onwards. 

1932. Jack W. Dunlap and Albert К. Kurtz. HANDBOOK 
OF STATISTICAL NOMOGRAPHS, TABLES, AND FORMULAS. 

Part I. Gives nomographs for obtaining prob- 
able errors, or standard errors, or standard de- 
viations of means, sums, differences, frequency 
in a class, proportion in a class, upperor lower 
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quartile, quartile deviation, 10-90 percentile 
range, Г, rank correlation coefficient, and also 
nomographs for biserial r, regression weights 
based upon biserial r, correction for attenua- 
tion, effect of range upon standard deviations 
and reliability coefficients, intelligence quo- 
tients, percentages, etc. 

Rant 11: 

Table 72, Squares, square roots [4 decimal 
places], reciprocals [5 and 6 significant fig- 
ures], and reciprocals of square roots [6 signi- 
ficant figures]. Argument 1 (1) 1000. 

Table 84.  "/]-r*" [4 places] Argument .000 
(.001) .999. 

Table 86. "]-г?" [4 places] Argument .000 
(.001) .999. 

Table 88, Reliability of two to ten tests,-- 
Spearman-B8rown formula. [4 places] Argument 
и. 000.01) 6994 

Table 95. Functions used in rank correlation. 
d? tabled for differences in rank, by .5, from 
45 to 75, N(N^-1) tabled for V = 1 (1) 100. 
Table also useful in getting 17. 

Table 98. Correlation coefficient "From the 
percentage of unlike signed pairs." 

Part III is an extensive table of formulas as 
used by different authors. 

1933. Leone Chesire, Milton Saffir, and L. L. 
Thurstone. COMPUTING DIAGRAMS FOR THE ТЕТВА- 
CHORIC CORRELATION COEFFICIENTS. Consisting of 
46 trivariate diagrams, which, when entered by 
means of the proportionate frequencies in two 
marginal totals and one cell frequency, yield 
tetrachoric r. Three decimal place accuracy may 
be expected. 

1933-1935. Harold T. Davis. TABLES OF THE 
HIGHER MATHEMATICAL FUNCTIONS. Vol. I, 1933. 
Vol. II, 1935. There is given throughout a dis- 
cussion of the theoretical bases for and the 
graphs and the properties of the large number of 
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functions dealt with. 

Vol. I, Parts III, IV, and V discuss, give 
tables and bibliography covering a variety of 
interpolation procedures. Five tables of log Г 
and one of 1/ l (refi) are given. The argument 
interval is small and the number of decimal 
places varies from 10 to 12 to 15. Six tables 
give or involve the Psi function, which is the 
digamma function as defined by Eleanor Pairman, 
in Tracts for Computers, No. 1, 1919. The argu- 
ment interval is small and the number of decimal 
Places varies from 10 to 12 to 15 to 16 to 20. 

Vol. II treats of: (а) The polygamma func- 
tions, giving tables of trigamma, tetragamma, 
pentagamma, and hexagamma functions, employing a 
fairly small interval in the argument and entries 
from 10 up to 20 decimal places. (5) Bernoulli 
polynomials and numbers. (с) Euler polynomials 
and numbers. (2) Gram polynomials. (е) func- 
tions of polynomial approximation. 

1934-1938.  BIOMETRIKA PUBLICATIONS. Karl 
Pearson, Ed. University College, London, W. C. 1. 

Tables of the Incomplete Beta-Function. 1934. 
The complete beta- function is defined. 


B(p,q) = | x?-'(1-x)*^!dx 
0 
and the incomplete beta-function 


B.(p,q) = f 1 (1-х ^! dx 
0 
The magnitude tabled is 


І,(р,9) = B.(p,a)/8(p,q) 


This trivariate tabling required changes in in- 
terval in p and а, but a uniform interval in х 
of .01 for the entire range X = 0 to x = ] was 
employed. Even so the tables require 494 pages. 
We note the relationship between І,(р,9) and the 
variance ratio. F;; (see [9:21]): 1, j, F,;, and 
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Р,, refer to Kelley's notation and p, а, and x 
to Pearson's. 


8 X J E R 
> 29; j ак S x; Р,у= І,(р,9) 
It may be stated that P;, via these Pearson Tables 
is usually one and sometimes two decimal places 
more accurate than by the normalizing transfor- 
mation of Chapter IX, Section 5 herein. 

Tables of the Incomplete Gamma Function. Ве- 


issue, 1934. 


uV p *1 
I (up) =(] to avv) боха 
0 


is tabled. It is the probability integral of a 
Type III distribution. [7 decimal places in 
Tables I and II and 8 in Table III.] Table I 
arguments p = .0 (.1) 5.0 (.2) 50.0 and u = .0 
(.1) 16.9. Table II arguments p = -1.00 (.05) 
.00 and и = .0 (.1) 48.0. Table III is of log 
i'(u,p), wherein I'(u,p) = I(u,p)/uP*!, Argu- 
ments a = .0 (.1) 1.5 and p = -1.00-(.05). ;0 
(.1) 10.0. Table IV provides "constants of the 
skew curve у = у„хРе”*" and Table V gives "five 
figure values of I(u,p)." 

(F. N. David) Tables of the ordinates and 
probability inteóral of the distribution of the 
correlation coefficient in small samples. 1938. 
Gives the distribution of г for p (Kelley's ?)=,0 
(.1) .9, and = 3 (1) 25, 50, 100, 200, 400. 

1934. H. B. Dwight. TABLES OF INTEGRALS AND 
OTHER MATHEMATICAL DATA, 222 pages. Similar to, 
but somewhat more extensive then, Peirce’s Table. 

1935. Truman 1. Kelley. ESSENTIAL TRAITS OF 
MENTAL LIFE. Table XXII of Trigonometric func- 
tions used in making rotations of axes. Tabled, 
to 7 decimal places or 7 significant figures, 
are tan 29, cos@, sinf, сов 0, ѕіп20, 2sin@ cos, 
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3110 соз@, for the argument 0 = 0?00' (17) 1200” 
(1840022 0*0 (65) 8390722 (10108450077 

1938. R. А. Fisher and F. Yates. STATISTI- 
CAL TABLES FOR BIOLOGICAL, AGRICULTURAL, AND 
MEDICAL RESEARCH. 

Table III [to 3 decimals] Distribution of t 
[Student's f]. Argument is n, the d. o. f., for 
пке G D 30240; 60.1190; © / 

Table IV [4and5 figures] Distribution of HEE 
Argument is n, the d. o. f., for n = ] (1) 30. 

Table V (С. б. Colcord, і. S. Deming, В. A. 
Fisher, Н. №. Norton and Y. Yates) [4 decimal 
places] Distribution of z. "2 may be defined аз 
the difference of one-half thenatural logarithms 
of two different estimates of variance, one 
based on л; and the other on n, degrees of free- 
dom." Arguments are n, = 1 (1) 6, 8, 12, 24, o, 
and n, = 1 (1) 30, 40, 60, 120, o, Tables for 
percentage points 20, 5, 1, and .l are given. 
The second part of Table V tables [generally 3 
figures] the variance ratio instead oí z for the 
same arguments. Cf. Merrington and Thompson, 
"Tables of percentage points of the inverted 
beta (F) distribution," and Thompson, "Tables of 
percentage points of the incomplete beta-func- 
tion," and also Comrie and Hartley, "Table of 
Lagrangian coefficients for harmonic interpola- 
tion in certain tables of percentage points." 

Table VII [4 and 5 places] Transformation of 
г fo z. Here 2 is defined as in [10:43]. The 
argument is z = .0 (.1) 3, 4. › 

Tahle XV Latin squares. Sets up to 9X9 are 
given. 

Table XX Scores for ordinal (or ranked) data. 
Herein is given "the average deviate of the rth 
largest of N observations drawn from a normal 
distribution having unit variance." Sample 
sizes up to N = 40. 

Table XXIII Orthogonal polynomials. CRE UE 
Fisher and M. F. Yates and V. Satakopan) From 
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113::73 to-n' = 51, 

Table XXVI [5 decimal places] "Natural loga- 
rithms." Arguments 1.00 (.01) 10. (1) 999. 

Table XXVII Squares. Arguments 1 (1) 999. 

Table XXVIII [5 figures] Square roots.  Argu- 
ments 100 (1) 999, and also 10.0 (.1) 99.9. 

Table XXIX [6 places] Reciprocals. Arguments 
1.00 (.01) 9.99. 

ТаЬ1е ХХХ [6 figures in the factorial and 7 
decimal places in the logarithm of the factorial] 
Factorials. Arguments 1 (1) 200. 

Table XXXIII [2 figures] Random numbers. Six 
sets of 1250 each of two digit random numbers. 

1939 to 1944. NATIONAL BUREAU OF STANDARDS 
TABLES,- prepared by the FEDERAL WORKS AGENCY, 
Works Project Administration, for the City of 
New York, Arnold N. Lowan, Technical Director. 

Table of Natural Logarithms, [16 decimal 
places] Vol. I Integers from 1 to 50,000. Vol. 
II Integers from 50,000 to 100,000. Vol. III 
Decimal numbers from 0 to 5 byintervalsof 
.0001. Vol. IV Decimal numbers from 5 to 10 by 
intervals of .0001. 1941. 

Tables of the exponential function e*. [12 to 
18 decimals] Argument x - -2.5000, (.0001) 2.500 
(.001) 5.00 (.01) 10.00. 1939. 

Tables of circular and hyperbolic sines and 
cosines. [9 decimal places] Argument x (in ra- 
dians) from 0 to 2 by intervals of .0001. 1939. 

Tables of sines and cosines. [8 decimal 
places] Argument x (in radians) from 0 to 25 by 
intervals of .001. 1940. 

Tables of probability functions. [15 decimal 
places] 

Vol. I, 1941. 


2 2, 
e (= VBz of Table XV C here- 
Уп in) 


205 2 à 
—— -а =1=9, = - f Tabl ху C 
zz e Па CN PTR EY oats MN 
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Argument x (= x//2 of Table XV C herein) = .0000 
(.0001) 1.000 (.001) 5.946. 
Vol. II, 1942. 


(= Z of Table ху с herein) 


dof = 1-24 = р-9 of table xv c 
У2т 2 herein) 
Argument x (- x of Table XV C herein) - .0000 
(.0001) 1.000 (.001) 8.285. 

(Herbert Е. Salzer) Table of coefficients for 
inverse interpolation with central di fferences. 
(LO places] m = COr-y,)/ (Y 4-Y,)- Argument m = 
-000 (.001) .500. First printed in Journal of 
Mathematics and Physics, Vol. 22, no. 4, Dec., 
1943. 

(Herbert Е. Salzer) Table of coefficients for 
inverse interpolation with advancing differences. 
[10 places] т as in preceding. First printed in 
Journal of Mathematics and Physics, Vol. 23, no. 
2, May, 1944, 

Tables of Lagrangian Interpolation Coeffi- 
cients, 1944. п is the number of points used in 
a table having equally spaced arguments and p is 
the same as herein [13:93]. The tables give ex- 
act values, or values to 10 decimal places. 


For п = 3 the argument p = -1 (.0001) 1 
Ког п = 4 the argument p = -] (.001) 0 
(.0001) 1 (.001) 2 
Forn = 5 the argument p = -2 (.001) 2 
For п = 6 the argument p = -2 (.01) 0 
(.001) 1 (.01) 3 
Forn = 7 the argument р = -3 (.01)-1 
(.001) 1 (.01) 3 
For n = 8' the argument p = -3 (0). 0 


(.001) 1 (.1) 4 
Forn = 9 the argument p = -4 (.1) 4 
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For n = 10 the argument p = -4 (.1) 5 
For n = 11 the argument р = -5 (.1) 5 


Certain supplementary tables are given. These 
main tables are more extensive than Kelley's La- 
grangian interpolation coefficient tables, espe- 
cially in that they include 5, 7, 9, and 11 point 
interpolation, in that the interval in p for 
4 point is .0001 instead of .001, for 6 point is 
.001 instead of .01, for 8 point is .001 instead 
of .l, andin that 9 place decimal or exact values 
are given for n = 3 whereas Kelley rounded off 
to 5 decimal places to simplify use in the usual 
situation wherein quadric interpolation will not 
yield accuracy beyond an added 4 or 5 figures 
even though coefficient values are exact. 

1939. William Fleetwood Sheppard. THE PROB- 
ABILITY INTEGRAL. Completed and edited by the 
British Association for the Advancement of Sci- 
ence, Committee for the Calculation of Mathema- 
tical Tables. Cambridge University Press. Table 
I gives (F)x (= Kelley's q,/z,,— 18:02] and 
[8:03] to 12 decimals for argument x = .00 (.01) 
10.00. Table II gives F(x) to 24 decimals for 
x = .0 (.1) 10.0. Table IV gives L(x) (= -4n q,) 
to 16 decimals for x = .0 (.1) 10.0. Table V 
gives 4(х) (= log 9,) to 12 decimals for x = .0 
(.1) 10.0. Table VI gives 4(х) to 8 decimals for 
x = .00 (.01) 10.00. 

1941. L. J. Comrie and H. 0. Hartley. Table 
of Lagrangian coefficients for harmonic interpo- 
lation in certain tables of percentaée points, 
Biom., Vol. 32, 1941, pp. 183-186. 

1941. Catherine M. Thompson. Table of per- 
centage of the X? distribution, Biom., Vol. 32, 
1941, pp. 187-191. X^, to 6 significant figures, 
is given for Р = .995, .99, .915, .95, 2905 0757 
25025725. 510, 205. 5025: 30 5.005, and for 
d.o.f. = 1 (1) 30 (10) 100. 

1941. Catherine M. Thompson. Tables of per- 
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centage points of the incomplete beta function. 
Biom., Vol. 32, рр. 168-181. The function I,(»,c) 
is that defined by Pearson in Tables of the in- 
complete keta-function. The percentage points 
are the values of x which, forgiven values of P, 
р, and 9, satisfy the equation I,(p,c) = P. x, 
to 5 significant figures, is given for the fol- 
lowing values of P, o£ 2 p [= Kelley's d.o.f. i] 
and of 2 0 [= Kelley's d.o.f. 71: P = .50, .25, 
AO oS EIOS ао =} 019 Y0, 12, 
15, 20, 24, 30, 40, 60, 120, о. 25 = 1 (1) 30, 
40, 60, 120, о. Note that Fisher and Yates’ Ta- 
ble V (by H. W. Norton) giving percentage points 
for the related function "z" for P = .20 and 
4001, supplement this table. Also cf. Comrie and 
Hartley, Table of Lagrangian coefficients for 
harmonic interpolation in certain tables of per- 
centage points. 

1942. Edward Charles Dixon Molina. POISSON'S 
EXPONENTIAL BINOMIAL LIMIT. Table I, ~individual 
terms, and Table II,— cumulated terms. [6 deci- 
mal places] Argument interval varies from .001 
to 1. 

1942. SMITHSONIAN MATHEMATICAL TABLES, Hyper- 
bolic Functions. Fifth reprint, 1942, by George 
F. Зескег and C. E. Van Orstand. Derivatives re- 
corded in addition to items noted below. 

Table I. [5 decimal places] Loégarithns of 
hyperbolic functions. Argument u .0000(.0001) 
.100 (.001) 3.00, (.01) 6.00. 

Table II. [5 decimal places] Natural hyper- 
bolic functions, Same arguments as in Table I. 
As an illustrationofthe use of this table, note 
that u hereinisFisher’s z from г and tanh is г. 

Table III. [5 decimal places] Natural and 
logarithmic circular functions. Argument u - 
‚0000 (.0001) .100 (.001) 1.600. A supplementary 
table using a coarser argument provides the func- 
tions for higher values of u. 

Table IV. Ascending and descendiné exponen- 
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tial and log,,(e"). For argument и is given 
log,,(e") to 7 decimal places; e" to 7 or more 
significant figures; e`" to 7 decimal places. 
Argument и = .000 (.001) 3.00 (.01) . . . 15.00. 
A supplementary table using a coarser argument 
provides the functions for higher values of u. 

Table V. [6 places] Natural logarithms. Argu- 
ment u =0 (1) 1000 and thence by irregular argu- 
ments to 10,000. The same table serviceable to 
obtain e* and e^* for irregular arguments in X. 

Table VI. [7 decimal places] The Guderman- 
niaa. Argument и = .000 (.001) 3.00 (.01) 6.00. 
Table VII gives the "Anti-Gudermannian." 

1942-1943. Egon S. Pearson and Н. О. Hartley. 
The probability integral of the range in samples 
of N observations from a normal population. 
Biom., Vol. 32, 1942, рр. 301-310. The argu- 
Mente are N = 9 (1) 20, апа й, the ratio ога 
specific range to the population standard devia- 
tion, = .00 (.05) 7.25. The tabled entry is, to 
4 decimal places, the cumulated frequency to the 
(1,9) point in question. The same authors in 
Tables of tne probability integral of the stu- 
dentized range, Biom., Vol. 32, 1943, pp. 89- 
99, employ a second sample to secure an inde- 
pendent estimate of the population standard de- 
viation, and thence tables for calculating the 
desired probability integral. 

1943, M. Merrington and Catherine M. Thompson. 
Tables of percentage points of the inverted 
beta (F) distribution.  Biom., Vol. 33, 1943, 
pp. 73-88. F is Snedecor's F (Kelley's Fij), the 
variance ratio. F is tabled, to 5 significant 
figures, for Р = 25050525: 5-109 055 2029.0: 00 
.005, for v, (Kellev's d.o.f.i) 51 CL) E 1727 
15, 24, 30, 40, 60, 120, о, and for v, (Kelley's 
d.o.f. 1) =1 (1) 30, 40, 60, 120, o. Note that 
Fisher and Yates' Table V (by H. W. Norton), 
giving percentage points for P = .20 and .001, 
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supplement this table. Also cf. Comrie and Hart- 
ley, Table of Lagrangian coefficients for har- 
monic interpolation in certain tables of percen- 
tase points. 

1944. BESSEL FUNCTIONS. For references to 
the extensive tables of these functions and to 
literature about them see Harry Bateman and Ray- 
mond Claire Archibald, А guide to Tables of 
Bessel Functions, Mathematical Tables and Other 
Aids to Computation, Vol. I, No. 7, July, 1944. 

In press, 1947. The Truman Lee KELLEY STATIS- 
TICAL TABLES. 

Table I. Fight place normal distribution, 
simple correlation, and probability functions. 
The argument p - .5000 (.0001) .9999, or its 
complement, 9 = l-p, may variously be considered 
the proportionate area in the tail of a normal 
distribution, a probability, or a positive cor- 
relation coefficient, a sine, or a cosine of an 
angle, etc. ТаЬед are x and 2, the deviate and 
ordinate in a unit normal distribution, Урс, 


ру 2 11 EE 

У1-05, "1-97, and also Е!! and E ‚ the errors 
in linear and in quadric interpolation, and, for 
р апа а, E^! and pauli sepe errors in inverse 


linear and quadric interpolation. А supplemen- 
tary table for p > .9999 is given. Table XV C 
herein is anabridgement and modification of this 
table. 

Table II, Five place [fifth place rounded 
off] three-point interpolation coefficients. 
Argument p (a proportionate distance between two 
tabled arguments in any table having equally 
Spaced arguments) = .0000 (.0001) .5000. Ta- 
ble XV A herein is an abridgement of this table. 

Table III. Seven place [seventh place com- 
pensatorially rounded off] four-point interpola- 
tion coefficients. Argument p = .000 (.001) .500. 
Table XVBherein is an abridgementofthis table. 

Table IV. Ten place [tenth place rounded off] 
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six-point interpolation coefficients. Argument 
в =. 00) (.01) 50. 

Table V. Eleven place (exact) eiéht point in- 
terpolation coefficients. Argument р=.0 (.1) 1.0. 

Table VI. Four-place X? functions. Argument 
X* / Vd.o.f. = .0 (.1) 4.1. P, the probability 
that a divergence as great as Д? will arise as 
a matter of chance, is tabled for 1 (1) 10, 12, 
15, 19, 24, 30 d.o.f. Also direct and inverse 
interpolation errors ЕЕ НЕ 

Table VII. Eight place square roots, cube 
roots (for all three place numbers) and natural 
logarithms, for N = 1.0 (.01) 10.0. Table XV D 
herein is an abridgement of this table. 

Table VIII. Eight place Ө, functions for 
normalizing Es, tne variance ratio. Argument 
i (= d.o.f.) =1 (1) 100 (10) 200 (20) 300 (50) 
500, (100) 1000, 2. Table IX E herein is an 
abridgement of this table. 


SECTION 2. LAGRANGIAN INTERPOLATION COEFFICIENTS 


Table XVA, which is a 90 per cent abridgement 
of the three-point Lagrangian interpolation 
coefficients given in Tne Kelley Statistical 
Tables (in press, 1947) gives, for p = .001 
(.001) .500, the су, Су», and с, coefficients of 
[13:98a], with such compensatory rounding off 
to five decimal places as to make the rounding off 
error very small. (Зее Kelley, Tables, in press 
1947) When p 4.5 coefficients and tabled entries 
to, tı» and t, are multiplied and summed as in 
[13:982], and when p > 5 then 9 is the desired 
fraction and computation proceeds in the re- 
verse direction using Ёү» Ёс» and t_,. 

Table XV B, a 90 per cent abridgement of the 
four-point coefficients given in The Kelley 
Statistical Tables, gives, for p = .00 (.01) .50, 
the с, Су» Cı» с, coefficients of [13:993]. 
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TABLE XV A 
THREE-POINT INTERPOLATION COEFFICIENTS 
p Co € € p © Су e 
- + + - + + 
«000 .00000 1.00000 .00000 | .030 .01455 .99910 .01545 
«001 .00050 1.00000 .00050|.03| .01502 .99904 .01598 
.002 .00100 1.00000 .00100|.032 .01549 .99898 .01651 
.003 .00150 1.00000 .00150 | .033 .01596 .99892 .01704 
+004 .00199 .99998 .00201 | .034 .01642 .99884 .01758 
.005 .00249 .99998 .00251|.035 .01689 .99878 .01811 
+005 .00298 .99996 .00302 | .036 .01735 .99870 .01865 
«007 .00348 .99996 .00352|.037 .01782 .99864 .01918 
«008 .00397 .99994 .00403|.038 .01828 .99856 .01972 
4009 .00446 .99992 .00454|.039 .01874 .99848 .02025 
20 10 .00495 .99990 .00505|.040 .01920 .99840 .02080 
4011 .00544 .99988 .00556|.04| .01966 «99832 .02134 
+012 .00593 .99986 .00607|.042 .02012 .99824 .02188 
+013 „00642 .99984 .00658|.043 .02058 .99816 .02242 
4014 .00690 .99980 .00710|.044 .02103 .99806 .02297 
+015 .00739 .99978 .0076||.045 .02149 «99798 .02351 
4016 .00787 .99974 .00813|.046 -02194 .99788 .02406 
4017 „00836 .99972 .00864 | .047 .02240 „99780 ,02450 
+018 .00884 .99968 .00916 | .048 .02285 .99770 .02515 
4019 .00932 .99964 .00968|.049 . 02330 .99760 .02570 
4020 .00980 .99960 .01020|,050 . 02375 .99750 .02625 
«021 .01028 .99956 .01072|.05| . 02420 .99740 .02680 
4022 .01076 .99952 „01120 | „052 .02465 .99730 .02735 
4023 .01124 .99948 .01176 | .053 402510 .99720 .02790 
«024 .0117| .99942 .01229|.054 . 02554 .99708 .02846 
.025 .01219 .99938 „01281 | .055 .02599 .99698 .02901 
+026 .01266 .99932 .01334|.056 «02643 .99686 .02957 
.027 .01314 .99928 .01386 | .057 .02688 .99676 .03012 
+028 .0136| .99922 „01439 | .058 .02 732 .99664 .03068 
«029 .01408 .99916 .01492 | .059 .02776 .99652 .03124 
«030 .01455 .99910 .01545 |.060 .02820 «99640 .03180 


C. 
о 
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TABLE XV A 
THREE-POINT INTERPOLATION COEFFICIENTS 


Со 


сї 


+ 


629 


«02820 


«99640 


«04095 


„ 99190 


«02851 
«02908 
«02952 


«02995 
„03039 
‚03082 


.03126 
.03169 
„03212 


„99628 
. 99616 
«99604 


. 99590 
„99578 
. 99564 


„99552 
.99538 
.99524 


.03236 
„03292 
. 03348 


. 03405 
„03461 
.03518 


.0357% 
«03631 
.03688 


«04136 
«04177 
«04218 


+ 04258 
+ 04299 
+ 04339 
. 04380 


.0u420 
«04460 


«99172 
. 99154 
«99136 


.99116 
. 99098 
. 99078 
.99060 


. 99040 
.99020 


„05261 


.05320 
«05380 
«05440 


«03255 


«99510 


. 03745 


«04500 


.99000 


. 05500 


«03298 
«03311 
«03384 


„03426 
«03469 
«03511 


«03554 
. 03596 
. 03638 


„99496 
.99482 
«99468 


«99452 
«99438 
«99422 


«99408 
«99392 
«99376 


«03802 
.03859 
«03916 


«03974 
. 04031 
. 04089 


«04146 
«04204 
«04262 


«04540 
«04580 
‚04620 


„04659 
‚04699 
«04738 
«04778 


4.04817 
.04856 


. 98980 
. 98960 
.989u0 


.98918 
.98898 
„98876 
. 98856 


«98834 
«98812 


«05560 
«05620 
.05680 


. 05741 
«05801 
«05862 
«05922 


«05983 
«06044 


.03680 


.99360 


‚04320 


204895 


«98790 


«06105 


«03722 
„03764 
. 03806 


«03847 
.03889 
«03930 


«03972 
«04013 
. 04054 


.993 4 
.99328 
.99312 


.99294 
„99278 
.99260 


.99244 
.99226 
„99208 


. 04378 
«04436 
. 04494 


«04553 
«04611 
.04670 


. 04728 
«04787 
«04846 


«04934 
. 04973 
«05012 


«05050 
. 05089 
.05127 


«05 166 
«05204 
«05242 


„98768 
.98746 
.98724 


„98700 
.98678 
. 98654 


«98632 
«98608 
«98584 


«06166 
‚06227 
. 06288 


«06350 
„06411 
«06473 
«06534 


„06596 
. 06658 


«04095 


‚99190 


«04905 


«05280 


«98560 


«06720 
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TABLE XV A 
THREE-POINT INTERPOLATION COEFFICIENTS 
р с, €i €, D Cy с, Gy 
- + + - + + 


: 4123 .05394 .98488 .06906 


«120 .05280 .98560 .05720 


„121 .05318 .98536 .06782 
«122 .05356 .98512 .06844 


4150 .06375 .97750 .08625 


«151 .06410 .97720 .08690 
«152 .06445 .97690 .08755 
«153 .06480 .97660 .08820 


«154 .06514 „97628 .08886 
.155 .06549 .97598 . 08951 
«156 .06583 .97566 .09017 


«157 .06618 .97536 .09082 
«158 .06652 .97504 .09148 
«159 .06686 .97472 .09214 


.124 .05u3| .98462 .06969 
.125 .05469 .98438 .07031 
„126 .05506 .98412 .07094 


„127 .05544 .98388 .07156 
«128 .0558| .98362 .07219 
+129 .05618 .98336 .07282 


+130 .05655 .98310 .07345 
+131 .05692 .98284 .07408 
„132 „05729 .98258 „07471 
.133 .05766 .98232 .07534 


+134 ,.05802 .98204 .07598 
„135 .05839 .98178 .07661 
. 136 .05875 .98150 .07725 


4137 .05912 .98124 .07788 
+138 .05948 .98096 .07852 
4139 .05984 .98068 .07916 
4140 .06020 .98040 .07980 
. 1 .06056 .98012 .08044 
.142 .06092 .97984 .08108 
«143 .06128 .97956 .08172 


“144 .06163 .97926 .08237 
„145 .06199 .97898 .08301 
+146 .06234 .97868 .08366 


«147 .06270 .97840 .08430 
«148 .06305 .97810 .08495 
.149 .06340 '.97780 .08560 


.150 .06375 .97750 .08625 


4160 .06720 .97440 .09280 
4161 .06754 .97408 .09345 
„162 .06788 .97376 .09412 
«163 .06822 .97344 .09478 


«164 .06855 .97310 .09545 
«165 .06889 .97278 .096!! 
+166 .06922 .97244 .09678 


+167 .06956 .97212 .09744 
«168 .06989 .97178 .09811 
«169 .07022 .97144 .09898 
«170 .07055 .97110 .099u5 
«171 .07088 .97076 .10012 
«172 .0712| .97042 .10079 
»173 .07154 .97008 .10146 


-I7* .07186 .96972 .10214 
. 175 .07219 .96938 .10281 
»176 .07251 .96902 .10349 


4177 .07284 .96868 .10416 
4178 .07316 .96832 .10484 
4179 .07348 .96796 .10552 


.180 .07380 .96760 .10620 
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TABLE XV A 
THREE-POINT INTERPOLATION COEFFICIENTS 
p = Cy m p С c SE 
- + + - + t 


.180 .07380 .96760 .10620 


„18! .07412 .96724 .10388 
.182 .07444 .96688 .10756 
.183 .07476 .96652 .10824 


.184 .07507 .96614 .10893 
.185 .07539 .96578 .10961 
.186 .07570 .96540 .11030 


.187 .07602 .96504 .11098 
.188 .07633 .96466 .11167 
.189 .07664 .96428 .11236 
4190 .07695 .96390 .11305 
„19! .07726 .96352 .11374 
.192 .07757 .96314 . 11443 
.193 .07788 .96276 .11512 


.194 .07818 .96236 .11582 
.195 .07849 .96198 „11651 
.196 .07879 .96158 .11721 


.197 .07910 .96120 .11790 
.198 .07940 .96080 .11860 
„199 .07970 .96040 .11930 


.08295 .95590 .12705 


.21| .08324 .95548 .12776 
.212 .08353 .95506 .12847 
.213 .08382 .95464 .12918 


.214 .08410 .95420 .12990 
.215 .08439 .95378 . 13061 
+216 .08467 .95334.15133 


.217 .08496 .95292 . 13204 
.218 „08524 „95248 .13276 
.219 .08552 .95204 .13348 
4220 .08580 .95160 .13420 
.22| .08608 .95116 . 13492 
.222 .08636 .95072 .13564 
.223 .08664 .95028 . 13636 


.224 .0869| .94982 . 13709 
+225 .08719 .94938 . 13781 
.226 .08746 .94892 . 13854 


.227 .08774 .94848 .13926 
.228 .0880| .94802 .13999 
„229 .08828 .94756 .14072 


1200 .08000 .96000 .12000|.230 .08855 .94710 . 14145 


.231 .08812 „94664. 14218 
.232 .08909 .94618 . 14291 
.233 .08936 .94572 . 14364 


.234 .08962 .94524 . 14438 
.235 .08989 .9u478 . 14511 
.236 .09015 . 94430 . 14585 


.237 .09042 .94384 . 14658 
.238 .09068 .94336 . 14732 
.239 .09094 .94288 . 14806 


.2u0 .09120 .94240 .14880 


.20| .08030 .95960 .12070 
.202 .08060 .95920 .12140 
.203 .08090 .95880 .12210 


.204 .08119 .95838 .12281 
.205 .08149 .95798 . 12351 
.206 .08178 .95756 .12422 


.207 .08208 .95716 .12492 
.208 .08237 .95674 .12563 
.209 .08266 .95632 . 12634 


7(7(0 .08295 .95590 .12705 
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TABLE XV A 


THREE-POINT INTERPOLATION COEFFICIENTS 


S 
+ 


с, 
У? 


« 240 


‚09120 


. 94240 


«14880 


.270 .09855 


«927 10 


«17145 


‚2%! 
. 242 
«243 


24 
«245 
«246 
«247 


.248 
«249 


«09146 
«09172 
«09198 


«09223 
«09249 
«09274 


«09300 
«09325 
«09350 


«94192 
. 94144 
„94096 


«94046 
«93998 
. 93948 
.93900 
.93850 
«93800 


«14954 
«15028 
«15102 


+15177 
«15251 
«15326 
«15400 


«15475 
«15550 


.271 .09878 
«272 .0990| 
.273 .09924 


.274 .09946 
«275 .09969 
.276 .0999| 


277 .10014 
«278 .10036 
-279 .10058 


„92656 
„92602 
«92548 


«921492 
.92438 
„92382 


. 92328 
„92272 
«92216 


«17222 
«17299 
. 17375 


«17454 
«17531 
. 17609 


. 17686 
‚ 17764 
«17842 


«250 


«09375 


.93750 


+ 15625 | .280 .10080 


.92160 


.17920 


«251 
«252 
‚253 
«254 
«255 
«256 
«257 
«258 
„259 


«09400 
«09425 
«09450 
«09474 
«09499 
«09523 
«09548 
„09572 
«09596 


»93700 
.93550 
.93600 
«93548 
+ 93498 
«93446 
.93395 
«93314 
«93292 


«15700 
«15775 
«15850 
«15926 
«160017 
«16077 
«16152 
«16228 
«16304 


„281 .10102 
«282 „10124 
„283 . 10146 


«284 „10167 
«285 . 10189 
+286 .10210 


«287 . 10232 
«288 .10253 
. 10274 


‚92104 
«92048 
«91992 


«91934 
«91878 
„91820 
«91764 
«91706 
«91648 


«17998 
. 18075 
«18154 


«18233 
«18311 
. 18390 


‚ 13468 


„18547 
«18626 


4260 .09620 


„93240 


. 16380 . 10295 


‚91590 


«18705 


‚261 
+262 
+263 
« 264 
„265 
«266 


«267 
«268 
«289 


«09644 
«09668 
«09692 


. 09715 
«09739 
«09762 


«09786 
«09809 
«09832 


.93188 
-93136 
«93084 


«93030 
.92978 
„92924 


«92872 
.92818 
.92764 


„16456 
«16532 
«16608 


. 16685 
„16761 
«16838 


«16914 
«16991 
«17068 


«10316 
«10337 
«293 .10358 


.294 .10378 
4295 .10399 
+298 .10419 
4297 .10440 
.298 .10460 
4299 .10480 


«91532 
«91474 
«91416 


«91356 
. 91298 
„91238 
«91180 
„91120 
«91060 


. 18784 
«18863 
«18942 


. 19022 
. 19101 
«19181 
«19260 
. 19340 
«19420 


«270 


«09855 


„92710 


«17145 | .300 .10500 


«91000 


. 19500 
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THREE-POINT 


TABLE XV A 
INTERPOLATION COEFFICIENTS 


633 


со 


e 
E 


1 
+ 


2 
28 


-10500 


-91000 


„11055 


.89110 


‚10520 
«10540 
«10560 


. 10579 
. 10599 
.10618 


. 10638 
.10657 
.10676 


« 90940 
.90880 
.90820 


490758 
. 90698 
.90636 


.90576 
«90514 
.90452 


.19580 |.331 
.19660 |.332 
.197u0 |.333 


.19821 |.334 
.19901 |.335 
.19982 |.336 


. 20062 | .337 
.20143 |.338 
.20224 |.339 


«11072 
. 11089 
«11106 


«11122 
. 11139 
«11155 


‚11172 
«11188 
. 11204 


«89044 
«88978 
„88912 


. 88844 
.88778 
.88710 


«88644 
„88576 
. 88508 


«219145 


+ 22028 
-22111 
„22194 


«22278 
«22361 
«22445 


. 22528 
+ 22612 
.22695 


. 10695 


.90390 


.20305 |.340 


. 11220 


«88440 


.22780 


«10714 
. 10733 
. 10752 


. 10770 
. 10789 
«10807 


. 10826 
«10844 
«10862 


.90328 
«90256 
«90204 


„90140 
„90078 
.9001u 


.89952 
. 89888 
‚8982 


-20386 |.341 
„20467 |.3%2 
-20548 |.343 


.20630 |. 344 
„2071! |.345 
„20793 |.346 


«20874 |. 347 
.20956 |.348 
«21038 |.349 


„11236 
„11252 
«11268 


«11283 
«11299 
«11314 


. 11330 


„11345 
«11360 


«88372 
«88304 
„88236 


.88166 
.88098 
«88028 


«87960 
.87890 
= 87820 


«22864 
.22948 
. 23032 


428117 
‚23201 
«23286 


. 23370 
.23455 
„23540 


. 10880 


‚89760 


.21120 |.350 


. 11375 


.87750 


‚23625 


. 10898 
- 10916 
. 10934 


«10951 
‚ 10969 
„10986 


«11004 
11021 
«11033 


. 89696 
„89632 
.895 68 


«89502 
«89438 
«89372 


. 393 08 
„89242 
. 89176 


„21202 | .351 
.21284 | .352 
.21366 | .353 


.21449 | .354 
„21531 | .355 
.21614 | .356 


.21696 | -357 
.21779 |.358 
-21862 


. 11390 
«11405 
. 11420 


«11434 
. 11449 
. 11463 


«11478 
.11492 
.11506 


«37680 
„87610 
. 87540 


«87468 
«87398 
„87326 


„87256 
.8718u 
.87112 


«23710 
«23795 
.23880 


«23966 
. 24051 
«24137 


. 24222 
«24308 
«24394 


«11055 


‚89110 


«21945 


. 11520 


. 87040 


«24480 
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TABLE XV A 
THREE-POINT 


INTERPOLATION COEFFICIENTS 


с, с, р С, 
a fe € = 


Si 
f: 


«11520 .87040 -390 .11895 


«84790 


.27105 


Җ‚2%566 
«24652 
. 24738 


. 24825 
„2491! 
. 24998 


«25084 
*25171 
25258 
«25345 
+ 25432 
„25519 
«25606 


.25694 
.2578 | 
+ 25869 


«25956 
«26044 
«26132 
+ 26220 
„26308 
.26396 
«26484 


«26573 
+ 2666) 
. 26750 


«26838 
. 26927 
«27016 


«11534 «391 .11906 
. 11548 „86896 -392 .11917 
.11562 .86824 .393 .11928 


. 11575 .394 . 11938 
«11589 .395 . 11949 
411602 4396 .11959 


«11616 «397 .11970 
«11629 „398 .11980 
«11642 .399 .11990 
«11555 2400 .12000 
«11668 «401 . 12010 
„1168 | . 402 .12020 
„ 11694 «403 .12030 


. 11706 .404 .12039 
«11719 «405 .12049 
«11731 . 406 .12058 


‚лш .407 .12068 
. 11756 .408 .12077 
. 11768 . 409 .12086 
‚11780 4410 . 12095 
411792 П .1210% 
«11804 “412. 12113 
11816 «ЧИЗ. 12122 


«11827 «44 . 12130 
. 11839 «5 „12139 
«11850 416. . 12147 


. 11862 4417 .12156 
« 11873 -418 .12164 
„11884 „419 . 12172 


.86384 
.86310 
.86236 
.86162 
.86088 


„86012 
‚85938 
.85862 


‚85788 
„85712 
.85636 
.85560 
«85484 
«85408 
«85332 


«85254 
«85178 
«85100 


«85024 
. 84946 
.8 4868 


„381 
.382 
.383 


. 384 
«385 
«386 


«387 
. 388 
. 389 


«84712 
«84634 
. 84556 


. 84476 
. 84398 
„84318 


= 84240 
„84160 
.84080 
. 84000 
. 83920 
.83840 
.83760 


„83678 
. 83598 
+83516 


.83436 
-83354 
« 83272 
. 83190 
‚83108 
„83026 
. 82944 


.82860 
.82778 
«82694 


„82612 
«82528 
«82444 


«27194 
.27283 
.27372 


«27462 
«27551 
«27641 


«27730 
«27820 
«27910 
«28000 
«28090 
«28180 
«28270 


«28361 
«28451 
«28542 
«28632 
«28723 
= 28814 
«28905 
‚28996 
«29087 
«29178 


„29270 
+ 29361 
«29453 
«29544 


«29636 
«29728 


4390 .11895 .84790 .27105 | .420 .12180 


+82360 


„29820 
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TABLE XV A 
THREE-POINT INTERPOLATION COEFFICIENTS 
D Со cr АВ со Ст C2 
- + + - + + 


4420 .12180 .82360 .29820 


.42| .12188 .82276 .29912 |.451 .12380 .79660 .32720 
.422 .12196 .82192 .30004 | .452 .12385 „79570 .32815 


„423 .1220u .82108 .30096 | .453 . 12390 .79480 .329 10 


„424 .1221| .82022 «30189 | .454 . 12394 .79388 .33006 
.425 „12219 .81938 .30281 | .455 . 12399 .79298 .33101 
.426 .12226 .81852 .30374 | .456 . 12403 .79206 .33197 


.427 .12234 .81768 .30466 |. 457 . 12408 .79116 .33292 
.u28 .12241 .81682 .30559 |.458 . 12412 .79024 .33388 
„429 .12248 .81596 .30652 | .459 . 12416 .78932 . 33484 
-430 .12255 .81510 .30745 | .460 . 12420 .78840 . 33580 
143| .12262 .81424 .30838 |.46| . 12424 . 78748 .33676 
.432 .12269 .81338 .30931 |.462 . 12428 .78656 „33772 
.433 .12276 .81252 .31024 | .463 . 12432 .78564 .33868 


„434 .12282 „81164 .31118 | .464 . 12435 + 78470 .33965 
.435 .12289 .81078 «31211 | .465 . 12439 .78378 . 34061 
.436 .12295 .80990 .31305 |.466 . 12442 .78284 . 34158 


„437 .12302 .80904 .31398 |.467 . 12446 .78192 . 34254 
.438 .12308 .80815 .31492 |. 468 - 12449 .78098 „34351 
.439 . 12314 .80728 .31586 |.469 . 12452 .78004 . 34448 
. 440 .12320 .80640 .31680 4470 .12455 .77910 . 34545 
Г .12326 .80552 .31774 |.47| . 12458 «77816 „34642 
.4u2 .12332 .80464 .31868 |.472 . 12461 .77722 .34739 
.443 .12338 .80376 .31962 | .473 . 12464 .77628 .34836 


(444 . 12343 .80286 .32057 |.474 . 12466 .77532 . 34934 
„445 .12349 .80198 „32151 |.U75 . 12469 .77438 . 35031 
„446 .12354 .80108 .32246 |.476 . 12471 .77342 .35129 


.447 .12360 .80020 .32340 | .477 . 12474 .77248 .35226 
„цув .12365 .79930 .32435 | .478 . 12476 .77152 .35324 
.uu9 .12370 .79840 .32530 | .479 . 12478 .77056 . 35422 


2450 .12375 .79750 .32625 |.480 . 12440 .76960 .35520 


. 79750 


636 


STATISTICAL TABLES 


TABLE XV A 


THREE-POINT INTERPOLATION COEFFICIENTS 


0 


«12480 


«12482 
«12484 
«12486 


«12487 
«12489 
«12490 


«12192 
«121493 
«124914 


. 76960 


. 12495 


.75990 .36505 


.76864 
.76768 
„76672 


„ 76574 
. 76478 
. 76380 


+ 76284 
«76186 
«76088 


«35618 
:35716 
+358 14 


«35913 
«36011 
«36110 


. 36208 
.36307 
«36:06 


. 12496 
«121497 
«12498 


«12498 
«12499 
«12499 


«12500 
. 12500 
‚12500 


„75892 .36604 
.75794 .36703 
„75696 .36802 


„75596 .36902 
.75498 .37001 
+ 75398 .37101 


«75300 .37200 
.75200 «37300 
«75100 .37400 


«121495 


.75990 


«36505 


«12500 


«75000 .37500 
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TABLE XV B 
FOUR-POINT INTERPOLATION COEFFICIENTS 


р с с; с; с, р 
(ps9). . = * * - (р>.5) 
1, 00000 «00000 


„01004 95 .00166 65  .99 
.02019 60 .00333 20 .98 
. 03043 65 .00u99 55  .97 


. 04076 80 „00665 60 .96 
.05118 75 .00831 25 .95 
.06169 20 .00996 40 .9% 


.07227 85 .01160 95  .93 
. 08294 40 .01324 
«09363 . 01487 


+01 „00328 35 .99490 05 
.02 .00646 80 .98960 40 
.03 .00955 45 „98411! 35 


„04 .01254 40 .97843 20 
+05 .01543 75 .97256 25 
.06 .01823 60 .96650 80 


.07 .02094 05 .96027 15 
.08 .02355 20 .95385 50 
409 .02607 15 .94726 45 
10 .02850 00 .94050 00 
411 .03083 85 .93356 55 
.12 .03308 80 .926u6 40 
.13 .03524 95 .91919 85 


.14 .03732 40 .91177 20 
.15 .03931 25 .90418 75 
„16 „04121 60 „89640 80 


.17 „04303 55 „88855 65 
. 18 „04477 20 . 88051 60 
«04642 487232 
.20 .0u800 00 .86400 00 
.0u9ug + 85553 
.22 .05090 80 .84692 40 
.23 .05224 45 .83818 35 


+24 .05350 40 .82931 20 
+25 .05468 75 .82031 25 
.26 .05579 60 .81118 80 


„27 .05683 05 .8019415 
.28 .05779 20 .79257 60 
.29 .05868 15 .78309 45 


„11538 45 „01811 15 „89 
. 12633 60 „01971 20 .88 
„13735 15 „02130 05 „87 


„14842 80 .02287 60 .86 
. 15956 25 .02443 75  .85 
.17075 20 „02598 40 „84 


. 18199 35 „02751 45  .83 
. 19328 40 „02902 80 
«20462 «03052 


+ 22741 «03345 
.23887 60 .03489 20 .78 
„25036 65 .03630 55  .77 


.26188 80 .03769 60  .76 
.27343 75 .03906 25  .75 
.28501 20 .0404040  .7u 


.29660 85 „04171 95  .73 
.30822 40 .04300 80 .72 
.31985 55 .04426 85 „71 


«33150 «04550 


638 
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TABLE XV B 
FOUR-POINT INTERPOLATION COEFFICIENTS 


D 
(p «.5) 


- + 


.30 


.05950 00 .77350 .33150 00 .04550 00 


«31 
.32 
433 
+ 34 
235 
= 36 


„37 
. 38 
.39 


ч! 
‚ 42 
„43 


ИТ 
45 
46 


247 
. 48 
749 


.34315 45 .04670 15 
«35481 60 .04787 20 
«36643 15 .0490| 05 


«37814 80 .05011 60 
„38981 25 .05118 75 
.40147 20 .05222 40 


«41312 35 .05322 45 
«42476 40 . 05418 80 
« 43639 


„06024 85 .76379 55 
. 06092 80 .75398 40 
„06153 95 .74406 85 


.06208 40 .73405 20 
. 06256 25 .72393 75 
.06297 60 .71372 80 


. 06332 55 .70342 65 
.0636| 20 .69303 60 
.06383 65 .68255 95 


«06400 «67200 


«06410 35 .66136 05 
. 06414 80 .65064 40 
. 06413 45 .63985 35 


. 06406 40 .62899 20 
«06393 75 .61806 25 
.05375 60 .60706 80 


«06352 05 .59601 15 
«06323 20 .58489 60 
«06289 15 .57372 45 


445958 95 .05684 65 
.47115 60 .05765 20 
.48269 65 .058u| 55 


.u9u20 80 .05913 60 
„50568 75 .05981 25 
.51713 20 .06044 ЧО 


.52853 85 .06102 95 
-53990 40 .06156 80 
.55122 55 .06205 85 


«06250 „56250 „56250 00 .06250 00 


Ps даалы 


NORMAL FUNCTIONS 639 
SECTION 3. NORMAL PROBABILITY FUNCTIONS 


The arguments p and q of Table XV C are the 
areas of the larger and the smaller tails of a 
unit normal distribution dichotomized at the 
point x, which is a standard score deviate. I 
is the area from the mean to this point x, and 2 
is the ordinate at the point x. 

The Е'' footings at the bottoms of the col- 
umns of this table give the maximum linear in- 
terpolation errors for I, p, or q arguments half 
way down the columns. On the last two pages of 
this table £''', the maximum three point inter- 
polation error when Г, р, ог q is the argument, 
is given. The E-'' footings are the maximum in- 
verse linear interpolation error in I, p, or q 


when x is the argument. 
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TABLE XV C 


x 


.000000 | .398942 | - 79788 |. .250000 


“002507 |.398941 | - 79948 |. -249999 
.005013 |.398937 | ~ :80108 |. .249996 
.007520 |.398931 | . .80268 |. .249991 


.010027 |.398922 | . .80428 |. .249984 
.012533 |.398911 | - 80588 |. .249975 
.015040 | .398897 | - 80748 | . 249964 


017547 | -398881 | . .80909 |. :249951 
020054 |.398862 | . 81070 |. .249936 
.022562 | .398841 | ~ .81230 |. .249919 


81391 |. .249900 


:398791 |. 481552 |. ‚249879 
398762 | - 481714 | . .249856 
.398730 | - „81875 | . .249831 
.82036 |. .249804 
82198 |. 249775 
482360 |. 249744 
.82522 |. 5 .249711 
.82684 |. .249676 
.82846 |. .249639 


.050154 |. .83008 |. .249600 


.052664 Зил |. 249559 
2055174 |. 483334 |. 2149516 
057684 |. 483497 |. 249471 


.060195 | .398220 | . .83660 „249424 
:062707 | .398159 | - 483823 |. :249375 
.065219 |.398096 | . .83986 |. ў ‚249324 


.067731 |.398028 84150 |. ‚249271 
.070243 |-397959 ‚84292 |. ‚249216 
:072750 |.397888 484477 |. :249159 


.075270 |.397814 84641 |. .249100 
:07778. 397737 | - 484805 |. .249039 
08029) -397658 484970 |. ‚248976 
.082813 |.397577 | - 485134 |. „248911 
:085329 |.397493 | - 85299 |. 248844 
:397406 |. 485464 |. :248775 
:397317 |. 485629 | - 248704 
«397225 | - A E .248631 
-397131 | - : 3 -248556 
| 397034 = dre B .248479 


-396935 E 5 E 48400 


4000000 


х 


102953 
105474 
„107995 


‚110516 
113039 
115562 
„118085 
„120610 
123135 


TABLE XV C (conttnveo) 


100434 |. 
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1396834 E 73352 
1396729 | -458 | .86622 | .73197 
.396623 | .457 | -86788 | -73043 
‚396513 | -456 .86955 | .72888 
-396401 | .455 .87121 | .72734 
396287 | .454 | -87288 | .72580 
396170 | -453 | 487455 | -72426 
.396051 | .452 87622 | .72272 
395929 | - 72118 


1248319 
.248236 
.248151 
.248064 
247975 
247884 
247791 


247696 
247599 


125661 


128188 
130716 
133245 


135774 
.138304 
140835 
‚143367 
‚145900 
148434. 


+153505 


161119 
„163658 
.166199 
‚168741 
171405 
‚173829 


‚178921 
181468 


184017 


‚186567 
‚189118 
191671 


«194225 
‚196780 
«199336 


„150969 |. 


392960 


176374 


„392608 
«392427 
-392245 


-392059 
‚391870 
‚391681 


-391488 


.391293 
GE 


395678 | .449 | -88124 | -71811 
395549 | .448 | -88292 | .71657 
395417 | 4447 | -88460 | .71504 

„446 88628 | .71350 
395145 | - 88797 


71197 


394863 70891 
394719 | .442 | .89303 | .70738 
394572 | 441 | -89472 | .70585 


394422 E 

394270 | .439 .89811 | .70280 
E .438 | -89981 | .70127 
:393957 | -437 90150 | .69975 
:393798 | -436 .90321 | .69822 
.393635 | .435 | -90491 | .69670 
393470 | -434 .90661 | .69518 
.303303 | -433 | -90832 | .69366 
„393133 432 91003 


.201893 


Й 69214 |. 
E 69062 | 3 
E 68910 |. 


:247399 
247296 
‚247191 


247084 
«246975 
246864 


‚246751 
246636 
246519 


246279 
,246156 
.246031 


245904 
245775 
245644 


245511 
245376 
245239 
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T 
201893 |. 


.243600 


204452 
207013 
209574 


.212137 
.214702 
.217267 


:219835 
.222403 
:224973 |. 
227545 
230118 
:232693 
235269 
:237847 
:240426 
‚243007 
245590 
248174 
.250760 


253347 |- 


-243439 
-243276 
243111 


242944 
-242775 
-242604 
-242431 
.242256 
:242079 
:241900 
.241719 
.241536 
241351 
241164 
:240975 
:240784 


«240591 
.240396 


.240199 |. 


.240000 


2559936 
258527 
„261120 | . 


263714 
311 


.271508 |. 
274110 |. 
.27 


:279319 |. 


:239799 
-239596 
.239391 


239184. 
238975 
238764 


.238551 
.238336 
238119 


_:237900 


-237679 
-237456 
‚237231 
-237004 
.236775 
:236544 
.236311 
.236076 
«235839 
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TABLE XV C tcowriNUED) 


z 


-380755 


«380449 
«380139 
:379827 


„379513 
„379195 
378875 


378553 
„378227 
377900 


.235600 


235359 
‚235116 


234871 


,234624 
„234375 
234124 
«233871 
„233616 
‚2353359 


377599 


.233100 


:377236 


‚376220 
375877 
„375530 


«375181 
‚374829 


| 374475 
„374118 


.232839 
.232576 
232311 
232044 
231775 
231504 
231231 
.230956 | .! 
.230679 | | 


.230400 


374544 


377254 
379927 
„382622 


‚373758 
„373395 
373030 
‚372662 
‚372292 
„371919. 
71543 


371164. 
:370783 


230119 
‚229836 
.229551 
,229264 
228975 
„228684 


,228391 
1228096 


:370399 


„370012 
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TABLE XV C 


(CONTINUED) 


2 2/9 | 2/р | 24 
4412463 | .366410 1.07768 | -55517 | -224420 
-161 | .415194 | .365996 | .339 | 1.07963 | .55370 | -224079 
.162 | .417928 | .365580 | .338 | 1.08160 | .55224 | .223756 
.163 | .420665 | -365160 | .337 | 1.08356 | .55077 | .223431 
164 | .423405 | .364738 | .336 | 1.08553 | .54930 | .223104 
165 | .426148 | .364314 | .335 | 1.08750 | .54784 | .222775 
.166 | .428895 | .363886 | .334 | 1.08948 | .54638 | .222444 
.167 | .431644 | -363456 | -333 | 1.09146 | .54491 | .222111 
168 | .434397 | -363023 | -332 | 1.09344 | .54345 | -221776 
.169 | .437154 | -362587 | -331 | 1.09543 .221439 
170 | .439913 | .362149 | -330 | 1.09742 .221100 
4171 | .442676 | .361707 | .329 | 1.09941 |. .220759 
172 445443 | -361263 | .328 | r.10141 | .53759 | -220416 
4173 | .448212 | .360817 | .327 | 1.10342 | .53613 | .220071 
4174 | 4450986 | .360367 | -326 | 1.10542 | .53467 | -219724 
175 | -453762 | .359915 | -325 | 1.10743 | .53321 | 219375 |. 
176 | 456542 | .359459 | -324 | 1.10944 | .53174 | -219024 | .676 
177 | .459326 | .359001 | .323 | 1.11146 | .53028 | .218671 | .677 
178 | .462113 | .358541 | .322 | 1.11348 | .52882 | .218316 | .678 
179 | .464904 | .358077 | .321 | 1.11550 217959 | .679 
180 | .467699 | .357611 | .320 -11753 .217600 | ‚680 
181 | .470497 | 3571424 .319 | 1.11957 | .52444 | -217239 | .681 
182 | .473299 | .356670 | .318 | 1.12160 | .52298 | .216876 | .682 
183 | .476104 | .356195 | .317 1.12364 | .52152 | .216511 | .683 
4184 | .478914 | .355718 | .316 | 1.12569 | .52006 | .216144 | .684 
-185 | .481727 | .355237 | -315 | 1.12774 | .51859 | -215775 | .685 
186 | .484544 | 354754 | -314 | 1.12079 | .51713 | .215404 | .686 
187 | .487365 | .354268 | .313 | 1.13185 | .51567 | .215031 | .687 
188 | .490189 | .353780 | .312 | 1.13391 +51422 | .214656 | .688 
.189 | .493016 | .353288 | .311 | 1.13597 | .51275 .214279 | .689 
3190 | .495850 |.352793 | -310 | 1.13804 | .51129 | .213900 | .690 
191 | .498687 | .352296 | .309 14012 |. .213519 | .691 
.192 501527 | .351796 | -308 | 1.14219 .50838 | .213136 |. 
193 | .504372 | .351293 | .307 | 1.14428 | .50002 | .212751 
4194 | .507221 | .350787 | -306 | 1.14636 | .50546 .212364 
195 | -510073 | .350279 | .305 | 1.14846 | .50400 | .211975 
196 | .512930 | .349767 | .304 | 1.15055 | .50254 | .211584 
4197 | -515792 | .349253 | -303 | 1.15265 | .50108 | .211191 
-198 | .518657 | .348736 | .302 1.15475 | .49962 | .210796 
199 | .521527 | .348216 | .301 | 1.15686 | .49816 .210399 
524401 | .347693 1.15898 | .49670 | .210999 | .700 


aso 000000 
00000 


«000004 |.00000*|. TINE 


—  — D 
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TABLE XV С (CONTINUED) 


x 2 2/9 | 5/Р | 24 
.524401 | .347693 1.15898 | .49670 | .210000 
-527279 | -347167 1.16109 | .49525 | -209599 
.530161 | -346638 1.16321 | .49379 .209196 
.533048 | -346107 1.16534 | -49233 | -208791 
:535940 | -345572 1.16747 | .49087 | 208384 
.538836 | .345035 1.16961 | «48041 | -207975 
:541737 | :344494 1.17175 | 48795 | -207564 
:544642 | .343951 1.17389 | 48649 | :207151 
-547551 | -343405 1.17604 | .48504 | -206736 
:550466 | -342856 1.17820 | .48358 | -206319 
.553385 | -342304 1.18036 | .48212 | .205900 


1.18252 | .48066 


.556308 | .341749 .205479 
«559237 | 341191 1.18469 | .47920 205056 
562170 | -340631 1.18687 | .47774 | -204631 
.565108 | .340067 1.18904 | .47628 | .204204 
.568051 | .339500 1.19123 | 47483 | -203775 
:570999 | -338931 1.19342 | -47337 | 203344 
.573952 | .338358 1.19561 | .47191 | -202911 
.5769то | .337783 1.19781 | .47045 | -202476 
.579873 | -337205 1.20002 | .46899 | .202039 


.582841 


336623 1.20223 | .46753 


:201600 


.585815 
:588793 


1.20444 | -46607 


«336039 
1.20666 | .46461 


335452 


‚201159 
‚200716 


.591777 |.334861 1.20888 | .46315 | .200271 
.594766 | .334268 1.21112 | .46170 | .199824 
-597760 | .333672 1.21335 | -46024 | -199375 
.600760 | .333073 1.21559 | -45878 | .198924 
.603765 | -332470 1.21784 | -45732 .198471 
.606775 | .331865 1.22009 | .45586 | .198016 
.609791 | -331257 1.22235 | -45440 | -197559 
.612813 | .330646 1.22461 | .45294 | -197100 
‚615840 | .330031 1.22688 | .45148 | .196639 
.618873 | -329414 1.22916 | .45002 .196176 
«621912 .328793 1.23143 | .44856 | -195711 
‚624956 | .328170 1.23372 | -44710 | -195244 
.628006 | .327544 1.23601 | .44564 | -194775 
.631062 | .326914 1,23831 | .44418 | -194304 
634124 | .326281 1.24061 | .44272 | .193831 
637192 | 325646 1.24292 | .44125 .193356 
640265 | .325007 1.24524 | -43979 | -192879 
324365 1.24756 192400 | .740 


00000+ |.vv000* 


000000 


„000000 ER 
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(CONTINUED) 


0000003] 


2 2/9 | z/b | bg 
+643345 | -324365 1.24756 | 443833 | -192400 
241 .646431 | .323720 1.24988 | .43687 | .191919 
.242 | .649524 | .323072 1.25222 | .43541 | .191436 
.243 | .652622 | .322421 1.25456 | .43394 | .190951 
.244 | .655727 | 321767 1.25690 | .43248 | .190464 
.245 | .658838 | .321110 1.25925 | .43102 | .189975 
.246 661955 | -320449 1.26161 | .42956 | .189484 
.247 | .665079 | .319786 1.26398 | .42809 | .188991 
.248 | .668209 | 319119 1.26635 | .42663 | .188496 
Ё :318449 1.26872 | .42517 | .187999 
317777 1.27111 42370 | .187500 
2 .317101 1.27350 | -42224 | .186999 
252 | -680797 .316421 1.27589 | -42077 | .186496 
: E 315739 1.27830 | .41931 | .185991 
254 | / .315053 1.28070 | .41784 | .185484 
.255 | .690309 | .314365 .1.28312 | .41638 | .184975 
.256 | .693493 | 313673 1.28554 | -41491 | .184464 
.257 | .696685 | .312978 1.28798 | .41345 | .183951 
.258 | .699884 | .312279 1.29041 | .41198 183436 
+259 | .703090 | .311578 1.29285 | .41051 
.260 | .706303 | .310873 1.29531 | .40904 
.26т | .709523 | .310165 1.29776 | .40758 | .181879 
.262 | .712751 | .399454 1.30023 | .40611 | .181356 
+263 715986 | .308740 1.30270 | .40464 | .180831 
264 | .719229 | .308022 1.30518 | .40317 | .180304 
:265 | .722479 | .307301 1.30767 | -40170 | .179775 
«266 | .725737 | -306577 1.31016 | .40023 | .179244 
:267 | .729003 | .305850 1.31266 | .39876 | .178711 
.268 | .732276 | .305119 I.31517 | .39729 | .178176 
.269 | .735558 | -304385 1.31768 :| .39582 | .177639 
270 | .738847 | .303648 1.32021 | .39435 | .1771 00 
.271 | .742144 | .302908 1.32274 | .39288 | .1765 
272 | -745450 | 302164 132548 | 30140 | .1/6016 
273 | -748763 | -301417 1.32783 | -38993 || -175471 
.274 | -752085 | .300666 1.33038 | .38846 | .17492 
275 | -755415 | -299913 1.33294 | .38698 174375 
:276 | -758754 | -299155 1.33551 | .3855i | .173824 
.277 | .762101 | .298395 1.33809 | .3840; 173271 
.278 | .765456 | .297631 1.34068 eee 21 СЕ 16 
z .768820 | .296864 1.34328 | .38108 | .172159 
-772193 | -296094 1.34588 | .38069 | .171600 


0000+ 00000+ |. 000000+ 


ль ба À—————————— 


NORMAL FUNCTIONS 


TABLE XV C 'coNTINUED) 


EH 


x z 2/4 | z/p | 24 
.772193 | -296094 1.34588 | -38069 .171600 
:775575 | -295320 1:34849 | -37813 | ,171039 
.778966 | -294542 1.35111 | -37665 | -170476 
.782365 | -293762 1.35374 | -37517 | -169911 
:785774 | 292978 1.35638 | 37370 | -169544 
.789192 | .292790 1.35902 | .37222 .168775 
.792619 | .291399 1.36168 | -37074 | -168204 
.796055 | -290605 1.36434 | -36926 .167631 
.799501 | .289807 1.36701 | .36778 .167056 
.802956 | .289006 1.36970 | .36629 .166479 
.806421 | .288201 1.37239 | -36481 .165900 
.809896 | .287393 1.37509 | -36333 | -165319 
.813380 | .286582 1.37780 | -36185 .164736 
.816875 | .285766 1.38051 | -36036 . 164151 
.820379 | -284948 1.38324 | -35888 .163564 
.823894 | .284126 1.38598 | .35739 .162975 
.827418 | -283300 1.38873 | -35590 .162384 
.830953 | -282471 1.39148 | -35442 | -161791 
.834499 | .281638 1.39425 | -35293 | -161 196 
.838055 | -280802 1.39702 | -35144 | -160599 
.841621 | 279962 1.39981 | -34995 | -160000 
.845199 | .279118 1.40260 | .34846 | .159399 
.848787 | .278272 1.40541 | .34697 .158796 
.852386 | .277421 1.40823 | -34548 | -158191 
.855996 | -276567 1.41106 | .34399 | -157584 
.859617 | -275709 1.41389 | .34250 | -156975 
.863250 | -274847 1.41674 | .34100 .156364 
.866894 | .273982 1.41960 | .33951 | -155751 
.870550 | -273114 1.42247 | -33801 .155136 
874217. .272241 1.42535 | -33652 | -154519 
.877896 | .271365 1.42824 | -33502 | -153900 
.881587 | .270486 1.43114 | .33352 | -153279 
.885291 | .269602 1.43405 | -33202 2152656 
.889006 | .268715 1.43698 | -33052 | -152031 
° 
.892733 | .267824 1.43991 | -32902 | .151404 
.896473 | -266929 1.44286 | -32752 | -150775 
.900226 | .266031 1.44582 | .32602 | .150144 
:903991 | .265129 1.44879 | -32452 | -149511 
907770 | .264223 1.45177 | -32301 .148876 
911561 | 263313 1.45477 | -32151 | -148239 

.262400| .т8о | 1.45778 | -32000 .147600 | 4 
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TABLE XV 


2 


C (conTINuen) 


2/9 


24 


915365 


.262400 


1.45778 


.147600 


«019183 
.923014 
.926859 


930717 
934589 
938476 
942376 


‚946291 
«950221 


261483 
260562 
:259637 
.258708 
257775 
256839 


255898 
1254954 
.254006 


1.46080 
1.46383 
1.46688 


1.46993 
1.47300 
1.47609 


1.47918 
1.48229 


1.48541 


-146959 
146316 
145671 


-145024 
-144375 
143724 
143071 
142416 
-141759 


954165 


253054 


1.48855 


.I41100 


.958125 
962099 
:966088 


.970093 
974114 
978150 


982203 
.986271 
:990356 


:994458 


252097 
.251137 
:250173 


:249205 
248233 
247257 


246277 
1245292 
:244304 


1.49170 
1.49486 
1.49804 


1.50123 
1.50444 
1.50766 


1.51090 


1.51415 
1.51742 


140439 
-139776 
139111 


138444 
-137775 
.137104 


-136431 
135756 
-135079 


243312 


1.52070 


134400 


:998576 
1.002712 
1.006864 


1.011034 
1.015222 
1.019428 


1.023651 
1.027893 


1.032154 


1.036433 


«242315 
«241315 
240310 
239301 
.238288 
237270 
236249 
235223 
-234193 


:233159 


1.52399 
1.52731 
1.53064 


1.53398 
1.53734 
1.54071 


1.54411 
1.54752 
1.55095 


-133719 
-133036 
132351 
-131664 
-130975 
130284 


2129591 
.128896 
.128199 


1.55439 | 


.127500 


1.040732 
1.045050 
1.049387 


1.053744 
1.058122 


1.062519 


1.066938 
1.071377 
1.075837 


232120 


231077 
‚230030 


.228979 


:227923 
.226862 


.225798 
.224728 


.223655 


1.55785 
1.56133 
1.56483 
1.56835 
1.57188 
1.57543 
1.57901 


1.58259 
1.58621 


.126799 
.126096 
-125391 
.124684 
*.123975 
.123264. 


.122551 
.121836 
121119 


1.080319 


1222577 


1.58983 


2120400 


x 
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2 


TABLE ХУ C (continueo) 


2/9 | z/b | 24 


1.080319 | .222577 | .140 | 1.58983 .25881 | .120400 
1.084823 | .221494 | -139 1.59348 | .25725 | .119679 
1.089349 | .220407 .138 | 1.59715 .25569 | .118956 
1.093897 | -219315 | -137 1.60084 | .25413 | -118231 
1.098468 | .218219 .136 | 1.60455 | .25257 | .117504 
1.103063 | .217119 | -135 1.60829 | .25100 | .116775 
1.107680 | .216013 | -134 1.61204 | .24944 | .116044 
1.112321 | .214903 | .133 1.61581 | .24787 | -115311 
1.116987 | .213789 | -132 1.61961 | .24630 | .114576 
1.121676 | .212669 | .131 1.62343 | .24473 | .113839 
1.126391 | .211545 | «130 1.62727 | .24316 | .113100 
-1.131131 | .210416 | .129 1.63113 | .24158 | .112359 
1.135896 | .209283 | .128 1.63502 | .24000 | .111616 
1.140688 | .208145 | .127 1.63894 | .23842 | .110871 
1.145505 | -207001 | .126 1.64286 | .23684 | .110124 
1.150349 | .205853 | -125 1.64683 | .23526 | .109375 
1.155221 | .204701 | 4124 1.65081 | .23368 | .108624 
1.160120 | 203543 | -123 | 1.65482 | .23209 .107871 
1.165047 | .202380 | .122 1.65885 | .23050 | .107116 
1.170002 | .201213 | -121 1.66292 | .22891 | .106359 
1.174987 | .200040 | .120 1.66700 | .22732 | .105600 
1.180001 | .198863 | .119 | 1.67112 | .22572 .104839 
1.185044 | .197680 | .118 1.67525 | .22413 | -104076 
1.190118 | .196493 | -117 1.67943 | .22253 | .103311 
1.195223 | .195300 | .116 1.68362 | .22093 | .102544 
1.200359 |.194102 | .115 | 1.68785 | .21932 .101775 
1.205527 |.192900 | .114 | 1.69211 | .21772 | «101004 
1.210727 | .191691 113 | 1.69638 | .21611 | .100231 
1.215960 | .190478 | .112 | 1.70070 .21450 | .099456 
1.221227 | .189259 | 111 | 1.70504 .21289 | ,.098679 
1.226528 | .188036 | „тто | 1.70941 .21128 | .097900 
1.231864 | .186806 | .109 1.71382 | .20966 | .097119 
1.237235 | -185572 .108 | 1.71826 | .20804 096336 
1.242642 | .184332 | -107 1.72273 | .20642 | -095551 
1.248085 | .183087 | -106 | 1.72724 .20480 | .094764 
1.253565 | -181836 | -105 1.73177 | -20317 | -093975 
1.259084 | .180579 | -104 1.73634 | .20154 | 093184 
1.264641 | .179318 | -103 | 1.74095 | -19991 .092391 
1.270237 | .178050 | -102 | 1.74559 .19827 | .091596 
1.275874 | -176777 | -101 | 1.75027 .19664 | .090799 


1.281552 


175498 


.00000+ .00000+|.000000+ 


1.75198 | .19500 | 000000 
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TABLE XV C 


5 


q 


2/4 


(CONTINUED) 


1.281552 


175498 


лоо 


1.7550 


. 000008 
00000] Е!! 


. 000001 Ed .0000+ 


1.287271 | .174214 | .099 | 1.7597 | -19336 | .089199 
1.293032 | .172924 | .098 | 1.7645 .19171 | .088396 
1.298837 | .171628 | .097 | 1.7694 .19006 | .087501 
1.304685 | .170326 | .096 | 1.7742 18841 | .086784 
1.310579 | -169018 | .095 | 1.7791 418676 | .085975 
1.316519 | .167705 | .094 | 1.7841 .18510 | .085164 
11322505 | .166385 | .093 | 1.7891 118345 | .084351 
1.328539 | .165060 | .092 | 1.7941 .18178 | .083536 
1.334622 | .163728 | .ogr | 1.7992 .18012 | .082719 
1.340755 | -162391 | .одо | 1.8043 | .17845 | «081000 
1.346939 | .161047 | .089 | 1.8095 17678 | .081079 
1.353174 | .159697 | .088 | 1.8147 17511 | .080256- 
1.359463 | .158340 | .087 | 1.8200 17343 | -079431 
1.365806 | .156978 | .086 | 1.8253 17175 | .078604 
1.372204 | .155609 | .085 | 1.8307 .17006 | .077775 
1.378659 | .154233 | .084 | 1.8361 .16838 | .076944 
1.385172 | .152851 | .083 | 1.8416 .16669 | .076її1 
1.391744 | .151463 | .082 | 1.8471 .16499 | .075276 
1.398377 8 | .o81 | 1.8527 16329 | .074439 
1.405072 .о8о | 1.8582 16159 | .073600 
1.411830 | ‚147258 | .079 | 1.8640 15989 | .072759 
1.418654 | .145843 | .078 | 1.8698 15818 | .071916 
1.425544 .077 | 1.8756 .15647 | .071071 
1.432503 |.142991 | .076 | 1.8815 | .15475 | .070224 
1.439531 | .141555 | .075 | 1.8874 -15303 | -069375 
1.446632 | .140112 | .074 | 1.8934 15131 | .068524 
1.453806 | .138662 | .073 | 1.8995 14958 | .067671 
1.461056 | .137205 | .072 | 1.9056 .14785 | .066816 
1.468384 1.9118 414611 | .065959 
1.475791 | . 1.9181 14437 | .065100 
1.483280 | 132788 | .069 | 1.9245 .14263 | .064239 
1.490853 | .131301 | .068 | 1.9309 414088 |, .063376 
1.498513 | .129807 | .067 | 1.9374 -13913 | .062511 
1.506262 | .128304 | .066 | 1.9440 -13737 | .061644 
1.514102 | .126794 | .065 | 1.9507 :13561 | .060775 
4 1.522036 | 125276 | .064 | 1.9574 | -13384 | .059904 
1.530068 | 123750 | .063 | 1.9643 | .13207 | .059031 
1.538199 | 122216 | .062 | 1.9712 ооо ХХ: 
1.546433 | .120674 | .обт | 1.9783 -12851 | .057279 
1.554774 | -119123 | .0бо | 1.9854 12673 | .056400 


«00000 * 


«000000 


x 
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TABLE XV C tcowriNUED) 


212 


1.554774 


119123 


‚12673 


.056400 


1.563224 
1.571787 
1.580467 
1.589268 
1.598193 
1.607248 


1.616436 
1.625763 


1.635234 


1.644854 


1.654628 
1.664563 
1.674665 


1.684941 
1 695398 
1.706044 
1.716886 
1.727934 
1.739198 


1.750686 


.117564 
.115996 
.114420 
.112836 
.111242 
.109639 
.108027 
‚106406 
.104776 
103136 
.1or486 
.099826 
:098157 
:096477 
094787 
.093086 
:091375 
.089052 
.087919 


1.762410 


1.774382 
1.780614 


1.799118 
1.811011 
1.825007 


1.838424 
1.852180 
1.866296 


.086174 


«084417 
.082649 
.080868 


:079075 
.077270 
:075452 
.073620 


:071775 
‚069915 


2.1645 


2.1750 
2.1856 


2.1965 
2.2077 
2.2192 


2.2309 
2.2430 


2.2553 


1.880794 


,068042 


2.2681 


1.959964 
1.977368 
1.995393 
2.014091 
2.033520 


2.053749 


.066154 
,064250 
.062332 
.060397 


:058445 
.056476 
:054490 
.052485 
.050462 


.048418 


2.2812 
2.2946 
2.3086 


2.3230 
2.3378 
2.3532 
2.3691 
2.3857 
2.4030 
2.4209 


.0000+ 


12494 
12314 
12134 


.11953 
11772 
11590 
11407 
11224 
„11041 
.10856 


.10672 
.10486 


„00000: 


:055519 
.054636 
053751 
.052864 
:051975 
.051084 
«050191 
«049296 
«048399 


046599 
-0451 
-044791 


.043884 
:042975 
.042064 
.о41151 
:040236 
:039319 


:037479 
.036556 
.035631 


:034704 
:033775 
.032844 
.031911 
030976 
«030039 
«029100 


.028159 
.027216 
1026271 


:025324 
024375 
.023424 
.022471 
021516 
.020559 


«000000 
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TABLE XV C (сонт!ншЕр) 


x 2 pa 


2.053749 | 048418 | - Ё “ .019600 


«046, К и А :018639 
ХОНЬ E 2 - ,017676 
.042160 | . B 4 .016711 


2.144411 | .040028 | . М 5 015744 
оо .037870 | - Ё 2 014775 
2.197286 | .035687 |. Ч д «013804 


2.226211 | .033475 | - 2 4 .012831 
2.257129 | .031234 | - М 4 :011856 
2.290370 | -028960 | . 2 E :010879 


2.326348 | .026652 | . Д 5 .009900 


2.365618 | . à 5 4 3 | -008919 
2.408916 |. С Y 4 .007936 
2.457264 | . К 5 ! «006951 


2.512144 |. 4 Р «от «005964 
2.575829 |. E Ё E :004975 
2.652070 | . 4 А < «003984 


2.747781 | - I T А .002991 
2.878161 |. К 5 М :001996 
3.090229 | . E Ї : :000999 


.00000+ ай 


SECTION 4. SQUARE AND CUBE ROOTS 


The methods given in Chapter XIII, Section 13, 
for extracting square and cube roots on a compu- 
ting machine are particularly expeditious when a 
good first approximation is started with. The 
two-figure argument table herewith provides such, 
especially if the approximate root is gotten by | 
linear interpolation. 
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TABLE XV D 
SQUARE AND CUBE ROOTS 

N УТ VION м AN 71003 
LO 1.00000 3.16228 1.00000 2.15443 4.6459 
1.1 1.04881 3.31662 1.03228 2.22398 4.79142 
|.2 1.095456 3.46410 1.06266 2.28943 4.93242 
1.3 1.14018 3.60555 1.09139 2.35133 5.06580 
1.4 1.18322 3.74166 1.11869 2.41014 5. 19249 
1.5 1.22474 3.87298 1.14471 2.46621 5.31329 
1.6 1.26491 4.00000 1.16961 2.51984 5.42884 
|.7 1.30384 4.12311 1.19348 2.57128 — 5.53966 
1.8 1.34164 4.24264 1.21644 2.62074 5.64622 
1.9 1.37840 _ 4.35890 1.23856 2.66840 5.74890 
2.0 1.41421 4.4724 1.25992 2.7142 5.84804 
2.1 1.44914 4.58258 1.28058 2.75892 5.94392 
2.2 1.48324 4.69042 1.30059 2.80204 6.03681 
2.3 1.51658 4.79583 1.32001 2.84387 6. 12693 
2.4 1.54919 4.89898 1.33887 2.88450 6.21447 
2.5 1.58114 5.00000 1.35721 2.92402 6. 29961, 
2.6 1.61245 5.09902 1.37507 2.96250 6.38250 
2.7 1.64317 5.19615 1.39248 3.00000 6. 46330 
2.8 1.67332 5.29150 1.40946 3.03659 6.54213 
2.9 1.70294 5.38516 1.42604 3.07232 6.61911 
3.0 1.73205 5.47723 1.44225 3.10723 6.69433 

MAXIMUM INTERPOLATION ERRORS IN THE 
NEIGHBORHOOD OF THE INDICATED W'S 

i 50 00031 .00099 .00028 00060 00130 
"ТЕ" .00002 .00007 .00002 00004 .00020 
A e 00011 .00035 .00009 .00019 00041 
SEED .00001 200001 — .00002 
2 i ! 100007  .00020  .00005  .00010 .00022 
Spi . 00001 


654 


3.0 


woo www Www 
= e eis: 
ojoon сл + WN 
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TABLE XV D (continued) 


SQUARE AND CUBE ROOTS 
Wo УТЖ уг и 71008 


1.73205 5.47723 1.44225 3.10723 6.69433 


1.76068 5.56776 1.45810 3.14138 6. 76790 
1.78885 5.65685 1.47361 3. 17480 6.83990 
1.81659 5.74456 1.48881 3.20753 6.91042 


1.84391 5.83095 1.50369 3.23961 6.97953 
1.87083 5.91608 1.51829 3.27107 7.04730 
1.89737 6.00000 1.53262 . 3.30193 7.11379 


1.92354 6.08276 1.54668 3.33222 7.17905 
1.94936 6.16441 1.56049 3.36198 7. 24316 
1.97484 6.24500 1.57406 3.39121 7.30614 


SE SET EE Nie 
2.00000 6.32456 1.58740 3.41995 7.36806 


2.02485 6.10312 1.60052 3.44822 7.42896 
2.00939 6.48074 1.61343 3.47603 7.48887 
2.07364 6.55744 1.62613 3.50340 7.54784 


2.09762 6.63325 1.63864 3.53035 7.60590 
2.12132 6.70820 1.65096 3.55689 7.66309 
2.14476 6.78233 1.66310 3.58305 7.71944 


2.16795 6.85565 1.67507 3.60883 7.77498 
2.19089 6.92820 1.68687 3.63424 7.82974 
2.21359 7.00000 1.69850 3.65931 7.88374 


2.23607 7.07107 1.70998 3.68403 7.93701 


MAXIMUM INTERPOLATION ERRORS |N 
THE NEIGHBORHOOD OF THE INDICATED N'S 


«00007 «00020 «00005 «00010 «00022 
«00001 


«00004 «00008 «00003 «00006 .00013 


«00003 «00009 «00002 «00004 «00009 


SQUARE AND CUBE ROOTS 
TABLE ХУ D (continued) 


SQUARE AND CUBE ROOTS 


m 2101) 


1.70998 3.68403 


655 


ТООТ 
7.93701 


1.72130 3.70843 
1.73248 3.73251 
1.74351 3. 75629 


1.75441 3. 77976 
1.76517 3.80295 
1.77581 3.82586 


1.78632 3.84850 
1.79670 3.87088 
1.80697 3. 89300 


N Г VION 
5.0 2.23607 7.07107 
5.1 2.25832 7.14143 
5.2 2.28035 7.21110 
5.3 2.30217 7.28011 
5.4 2.32379 7.34847 
5.5 2.34521 7.41620 
5.6 2.36643 7.48331 
5.7 2.38747 7.54983 
5.8 2.40832 7.61577 
5.9 2.42899 7.68115 
6.0 2.44949 7.74597 
6.1 2.46982 7.81025 
6.2 2.48998 7.87401 
6.3 2.50998 7.93725 
6.4 2.52982 8.00000 
6.5 2.54951 8.06226 
6.6 2.56905 8. 12404 
6.7 2.58844 8. 18535 
6.8 2.60768 8.24621 
6.9 2.62679 8.30662 
7.0 2.64575 8.36660 


1.81712 3.91487 


1.82716 3.93650 
1.83709 3.95789 
1.84691 3.97906 


1.85664 4.00000 
1.86626 4.02073 
1.87578 4.04124 


1.88520 4.06155 
1.89454 4.08166 
1.90378 4. 10157 


1.91293 Ч. 12129 


° MAXIMUM INTERPOLATION ERRORS IN 


THE NEIGHBORHOOD OF THE INDICATED N'S 


5.0,E!! —.00003 
6.0,E'' — .00002 


7.0,E!! .00002 


. 00009 


„00007 


„00005 


‚00002 „00004 
«00001 . 00003 


. 00001 . 00002 


7.98957 
8. 04145 
8. 09267 


8. 14325 
3. 19321 
8.20257 


8. 29 134 
8. 33955 
8.38721 


8. 43433 


8. 48093 
8.52702 
8.57262 


8.61774 
8.66239 
8.70659 


8. 75034 
8. 79366 
8.83656 


8. 87904 


«00009 


«00007 


«00005 
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TABLE XV D (continued) 
SQUARE AND CUBE ROOTS 
N УМ VION VION 7 TOON 
7.0 2.64575 8.36660 1.91293 Ч. 12129 8.87904 
7.1 2.66458 842615 1.92200 Ч. 14082 8.92112 
7.2 2.68328 8.48528 1.93098 4. 16017 8.96281 
2.70185 8.54400 1.93988 4. 17934 9, 00411 
2.72029 8.60233 1.94870 4.19834 9.04504 
2.73861 8.66025 1.95743 4.21716 9.08560 
2.75681 8.71780 1.96610 4.23582 9.12581 


3 

Ч 

5 

6 
.7 2.77489 8.77496 1.97468 4.25432 9. 16566 
8 

9 

0 


2.79285 8.83176 1.98319 4.27266 9.20516 
2.81069 8.88819 1.99163 4.29084 9.24434 
2. 82843 8.94427 2.00000 4.30887 9.28318 

| 2.84605 9.00000 2.00830 4.32675 9.32170 
8.2 2.86356 9.05539 2.01653 4.34448 9.35990 
8.3 2.88097 9.11043 2.02469 4.36207 9.39780 
8.4 2.89828 9.16515 2.03279 4.37952 9.43539 
8 2.91548 9.21954 2.04083 4.39683 9.47268 
2.93258 9.27362 2.04880 4.41400 9.50969 


5 
6 
7 2.94958 9.32738 2.05671 4.43105 9.54640 
8.8 2.96648 9.38083 2.06456 4.44796 9.58284 
9 
0 


8. 2.98329 9.43398 2.07235 4.46475 09. 61900 
9. 3.00000 9.48683 2.08008 4.48140 9.65439 
9. 3.01662 9.53939 2.08776 4.49794 9. 69052 
9. 3.03315 9.59166 2.09538 4.51436 9. 72589 
9. 3.04959 9.64365 2.10294 4.53065 9.76100 
9. 3.06594 9.69536 2.11045 4.54684 9. 79586 
9. 3.08221 9.74679 2.11791 4.56290 9.83048 
9. 3.09839 9.79796 2.12532 4.57886 9.86485 


Г 
2 
3 
4 
5 
6 
.7 3.11448 9.84886 2.13267 4.59470 9.89898 
8 3.13050 9.89949 2.13997 4.61044 9.93288 
3 3.14643 9.94987 2. 14723 4.62607 9.96655 
Е, үк 00002 .00005 . 00001 .00002 —.00005 
8. 0, i 200001 .00004 „00001 .00002 „00004 
9.0,8. i .00001 „00004 00001 .00001 . 00003 
10.0 Ё! „00001  .00008 .00001 .00001 . 00003 
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SECTION 5. SHORTER MATHEMATICAL TABLES LISTED 
ACCORDING TO OCCURRENCE IN EARLIER CHAPTERS 


Table IV M. The Number of Classes to Use 
for Graphic Portrayal of Samples of 
Size N Drawn from a Normal Population 


Table VI G. Ratios of Certain Measures 
of Variability to Their Standard Errors 
in the Case of Samples Drawn from a 
Normal Population 


Table VII A. Relative Frequencies in a 
Pearson Type III, В. =1, 24:5, dis: 


tribution, for 20 intervals 


Table VII D. Optimal Interval, in Terms 
of the Population c, for the Computa- 
tionof the Parabolically Smoo thed Mode, 
for Samples of Size № Drawn from a Nor- 
mal Population 


Table VII G. Distribution of 1000 Meas- 
ures the Logarithms of which Yield a 
Normal Distribution 


Table VIII C. Normal Probabilities 
Table IX E. Table of @ Values (for norm- 


alizing a variance ratio) 


Table X С. Corrections When Computing р 
From Tied.Ranks 


Tables X H and X I. Normal Bivariate Dis- 
tribution, in which r = .80, in Case 
of Coarse Grouping 


Table XI A. Non-Chance Differences for 
Different Ratios. с (due to chance) 
с (observed) 
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Page 
133 


232 


251 


259 


210 


293 
326 


369 


390 


418 
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Table XI B. Weighting Factors Consequent 
to Values of the Reliability Coeffi- 
cient 


Table XI D. Relationship Between Corre- 
lation and an Imposed Curtailment of 
one of the Variables 


Table XII A. Symbolic Expression of Con- 
stants Derived in Modified Doolittle 
Solution 4 Variables 


Tables XIII D and XIII E. Pearson Curve 
Турез 


Tables ХИТ Е. The Equi-Probable Range 
and the Mean Range, of Samples Drawn 
from a Normal Population (in Terms of 
the Population c) 


Table XIII Н. The Size (in c units) of 
the Interval Giving a Probability of 
.5 that the Frequency of the Median 
Interval Shall Exceed that of Either 
Neighboring Interval, in the Case of a 
Normal Distribution; the Expected Range 
(іп с units); and the Number of Classes 
Necessary to Cover this Range 


Table XIII I. Notation for Argunents, 
Tabled Entries and Differences 


Table XIV А. Basic Factorials and Gamma 
Functions 


Table XIV В. Of x = 2 sin"! WU 


Page 


424 


430 


462 


511 


531 


538 


540 


587 
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APPENDIX А 
MATHEMATICAL BACKGROUND TEST 


SECTION 1. THE BACKGROUND TEST 


Herein is given a "Background Test for Ele- 
mentary Statistics." The purpose of the test is 
threefold, (a) to enable a student to secure an 
idea as to whether his mathematical background 
is adequate for the pursuit of an elementary 
course in statistics, (b) to inform him of short- 
comings in his background, if they exist, which 
he should remedy by study of the appropriate 
elementary topics as found in arithmetics and 
elementary algebras, and (c) to enable an in- 
structor to classify his students and properly 
adapt instruction to them. The two first men- 
tioned purposes may be realized by the student 
himself without the aid or knowledge of the in- 
structor. To secure information (а) it is neces- 
sary that the student take the test in an en- 
tirely fair manner. He must observe time limits 
which, howevér, are not short. He must not coach 
himself upon the items of the test by consulting 
others who know the content of the test, or by 
reference to the scoring key given on pages fol- 
lowing the test. Не must be rigorous in marking 
his own paper, It may happen that the student’ s 
answer will be different from that given in the 
key, which is the sole guide for marking papers, 
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but different in a manner which the student real- 
izes, though none other would, is not due to a 
faulty understanding. Їп this case, he should 
mark his answer wrong in spite of his personal 
knowledge that he correctly understands the prob- 
lem and its solution. If he is not thus objec- 
tive and rigorous іп the grading of his own paper, 
he cannot interpret his position by comparing 
his score with the percentile scores given. 

Page l of the test, calling for certain per- 
sonal information about the student, provides a 
second somewhat different and somewhat less rel- 
iable means of estimating a student's prepara- 
tion for pursuing elementary statistics. ltems 
a, c, d, e, and f are probably pertinent but are 
not involved in the experience score as calcula- 
ted from this page. Their inclusion on the page 
will help the student to a rounded out picture 
of himself andofhis equipment. The items which 
are scored and combined into an experience score 
are Х,, X,, X, as explained herewith. 

X, is a score based on item (b), year in col- 
lege, according to the following schedule: 


X, score Year in college 
| Freshman 
2 Sophomore 
3 Junior 
4 Senior 
5 First year of graduate study 
6 Second year of graduate study 
7 Third year of graduate stucy 


X, is a score based on item (5), the most ad- 
vanced course in mathematics successfully pur- 
sued, according to the following schedule: 


X, score Most advanced course in mathematics 
8 Arithmetic, i.e., no algebra 
9 1/2 year or | year of High School 


algebra 
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10 Other high school mathematics, in- 
cluding a second course in algebra, 
trigonometry or geometry. 

И College algebra, or answer omitted 


12 Calculus 
13 Опе course beyond calculus 
14 Jwo or more courses beyond calculus 


X, is a score based on related courses and 
quality of work according to the following sched- 
ule: 


-| for mark of D or F 
item (h) credit 0 for mark of С or | ог no answer 
| for mark of A or В 


0 for no course 
Item (i) credit | for one or more courses ( or un- 
doubted equivalent in research work) 
> 


-| for mark of D or F 
Item (j) credit 0 for mark of C or | or no answer 
| for mark of A or B 


X, = sum of credits 


The final experience score X, is given by 
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A BACKGROUND TEST FOR ELEMENTARY STATISTICS 


Parts I, II 

FILL IN FULLY THE INFORMATION ASKED FOR ON THIS PAGE 

(a) Name ——— ———— — — Institution. Date 

(b) If an undergraduate student, encircle the year in 
college. 1 234 If a graduate student, encircle the 
year of graduate work. 1 2 3 

(c) Candidate for a degree? Yes. No. .If so, for what 
degree?. 

(d) Degree (or degrees) held... Institution —— —— 
СААБ Е. Бабе —— 


(e) Teaching, research. ог administrative position now 
held, if any? : 


T e SERERE 


(f) Experimental, measurement, and statistical duties 
connected therewith, if any? 1 SS 
Courses and experience connected with measurement, 
and statistics: 

(g) Most advanced course in mathematics pursued___—__— 
По Institution — —— ———— 

(h) Using the marking system: А = highest quarter; В = 
next to highest quarter; C = next to lowest quarter; 

D = lowest quarter; Е = failure; | = incomplete, 
course not finished; give yourself a mark to indicate 

as nearly as you can your standing in the course just 


mentioned 
(i) Course or courses in measurement and statistics—__—. 
Dite institution = 


(j) As above, for each course, in the order mentioned, 
give yourself a mark A, В, С, 0, Е, or ISS 


Lue 
i ZZ 
ги 22222 


ММАВҮ 


Parts I and IT 
(equivalent score) 


Parts III, IV, and V 
(equivalent score) 


N 


Total: Score on Pts. T, 11, 
ТИ, LV, and v 
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The purpose of this test is to enable an instructor of 
statistics toestimate the sort of work which the students 
of his class may pursue profitably, and to enable stu- 
dents of statistics in general to appraise their own 
mathematical equipment. Some students may be prepared 
for advanced work. Therefore, the test contains items 
which many students will be unable to do who are promising 
candidates for an elementary course. No student should 
feel discouraged, and each one should do all of those 
items which lie within his capacity. Work as rapidly as 
is consistent with accuracy. 


INSTRUCTIONS TO ADMINISTRATOR OF TEST: 


After 35 minutes working time announce, "Even 
if vou have not finished Part I start on Part 11." 


After another 7 minutes, ога total working 
time of 42 minutes, call time. 

An intermission isrecommended before starting 
Parts 111, IV, and V. 


SUMMARY 


WEIGHT, OR 
MULTIPLYING 
FACTOR 


PARTS T АМО ТІ SCORE 
FROM TABLE BELOW 


Crude Total 
Score: 222. Е 5 
Crude Total Г 


Equi valent 
Score 
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PART I. COMPUTATION 
A. Addition and Subtraction 
Carry all answers to the greatest number of decimal 


places found in the data of the problem, unless otherwise 
specified. Designate the answers clearly. 


|. Add 32.784, 764.45, 1.2733, .0056. 
2. Add 73.8, -13.4, .632, -.076. 

3. Subtract 17.384 from 8. 2631. 

Ч. Subtract .0758 from 1.6. 


5. Subtract -.848 from 3 3/4. 
B. Multiplication 


Carry answers to as many decimal places as the sum of 
the decimal places in the two factors. When the common 
fractions given are changed to decimals express them to 
the two nearest decimal places. 


6. Multiply 73.18 by .062. 
7. Multiply 13 1/6 by 4.74. 
8. Multiply —1.14 by 14.32. 
9. Multiply —.037 by 7/16. 


10. Multiply 8/14 by 10/9. (Express answer as а 
decimal to 3 places) 


BACKGROUND TEST 
C. Division 
Carry answers to three significant figures. 
Il. Divide „1145 by 3.76. 
12. Divide 9 7/12 by 24.28. 
13. Divide 1 3/4 by 72 4/9. 


D. Roots and Powers 
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Carry square root answers to as many significant fig- 
ures as given in the number the square root of which is 


to be extracted. 


14. Extract the square root of 5776. 
15. Extract the square root of „00049, 
16. Extract the square root of —169. 
17. Expand (9) 2. 
E. Percentage 
Keep answers to the nearest integer. 
18, Find 52 percent of 165. 


19. Find .52 percent of 916. 


20. The ratio of 3 to 4 equals the ratio 


of 7.5 to what? 
(Go right on to Part 11) 
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PART II. EXPONENTS 


Simplify the following: 


i (ante № I 
2. x'x? = 12, yb = 
m SESS) E ee 
4. 2(xty?)? = іу. УЗ у? = 
5 
х 
Е оо —— 
ip oi 
y E 
6. = 16. v 22° = кесу = 
+ у? 
Е er йш ae 
6 
8. m* m = 18. 2- = 
ёо 
т? nt 
т 5 IEEE icc 
Зєс-ар 
10. (4x3)? = 20222122: 
Е. -х(с- d)? 


(Go right on to Part III) 
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A BACKGROUND TEST FOR ELEMENTARY STATISTICS 
PARTS III EV V 


Name. Institution 
Date 


INSTRUCTIONS TO ADMINISTRATOR OF TEST: 
After 35 minutes working time announce, "Even 
if you have not finished Part 111 start on Part IV." 


After another 2 minutes announce, "Even if you 
have not finished Part IV start on Part V." 


After another 15 minutes, or a total working 
time of 52 minutes, collect all booklets. 


NO. RIGHT WT. | (NO, вт.) X CWT.) 
T ДЕРЛЕ 
BEEN 


m le 
= |= 
= 
= 
e 


< < 
m m 
крк 
ерге 
м 
+ — 
со 


TOTAL PART V 


CRUDE TOTAL PARTS III, Iv, AND V 


PARTS III, TV» AND V SCORE FROM 
TABLE BELOW 


EQUIVALENT 
SCORE 


EQUIVALENT 
SCORE 
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PART III. ALGEBRA 
. Add 5x + 8 and 3x + цу - 3. 


2. Add ux? — 3x +ху — 4 and 3х2 + 6x — y? - 22 - 2. 
3. Subtract ux' — 5х2 + 3x — 5 from х? + 9x" ~ 2x + à. 
4. Multiply 4x? by 7x. 
5. Multiply ux^ +3 by -Чху. 
6. Multiply ux? — 3x — 5 by 2x — Ч. 
7. Expand (m - n)’. 

2 


8. Factor m^ = n’. 


9. Write from memory the first 4 terms of (м + n). (00 
not derive) 


10. Solve the equation 6x — 12 = 0. 
11. What are the roots of x^ ~ 6x + 8 = 0? 
12. What are the roots of 6х? + 2x — 20 = 0? 


13. The number of real or imaginary roots of 5х2 + 3х? +x 
изор 


15. 


16. 


17. 


18. 


20. 
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2 
a 
Given the formula Ё Sb: What value will £ have 


if a 7 8 and b - 2? 


БЭР? 
w= Пе. М GP?) = 80, and X=5, solve 


КСК? ~ з) 


for w to two decimal places. 


D= . 1€ D= 25, A = 170, G, = 4, 94, = 84, 


solve for 4. 


The two sides of a right triangle are 8 and 1.8. 
What is the length of the hypotenuse? 


The quotient of two numbers is y and divisor is n. 
Represent the dividend. 


ах? + bx + c = 0 is a typical quadratic equation, 
the solution of which is given by 


n кш 
M -bsv b? - фас 


X > FS 


2a 


Exptess the roots of 5х? + 2x — 4 = 0 by means of 
this formula, but do not perform the indicated 
arithmetic computation. 


Solve simultaneously 2х — Say ca. 
Зх+ y= 


! 
ол 


(Go right on to Part I) 
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2. 


ET 


5. 


6. 


BACKGROUND TEST 
PART IV. TRIGONOMETRY 


X 


What is the sine of angle X in the 
adjacent figure? 


The cosine of angle Y? —— 
The tangent of angle Y? 


What is the numerical value of the 
tangent of 45 degrees? — — 


кы Бы m ESSE 


The cosine of 0 degrees is — —— 


What is the greatest possible value 
of the sine of an angle? : 


————— 


(Go right оп to Part V) 
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PART V. ANALYTIC GEOMETRY 


1-6. If the quadrants made by perpendicular coordinate 
axes are numbered as in the diagram, indicate as 
result of inspection, in the column headed quadrant, 
the number of the quadrant in which each of the 
following points lies. 


2. 


7. A straight line is represented by an equation in X 
and y of thee сме зл degree: 


8. The line 5x + Чу = 12 intercepts the Y axis at what 


point? 
9. The slope of the line y = 5x ~- 3 is 
10. The slope of the line 5x ~ ЗУ = 4 js Е =. 
ШЕ Ере ,slope of the line passing through the point 


(x = 6, у = 7) and the point (x = 2, y = 5) is 
ЕЕ eee 


12. ЧУ + 5Х = -10 is the equation of a line. Shifting 
axes 2 units to the right and 2 units up, and using 
x and y to designate the new variables, what then 
is the equation of the Tine? — — ———— 


РЕ ы or E ЫШ шысы е чыш 


РАВТ Т: 
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SECTION 2. 


SCORING KEYS 


A BACKGROUND TEST FOR ELEMENTARY STATISTICS 


PARTS I, IT 
COMPUTATION 
EACH ITEM WEIGHTED 1 
798.5229 |. 
60.956 2, 
—9. 1209 3. 
1.5242 Ч. 
4.598 5. 
4.53716 6. 
52.4258 ог 62.41 7. 
= 16.3248 8. 
-.01628 or ~.0162 9. 
or ~.016 
„635 ог .634 or .633 10. 
4030 ог .03 or „0304 M. 
or .0305 
.395 ог „394 12. 
«024 or .0241 13. 
76. ог 76 ог ES 
75.00 or 75.0 
4022 ог .02214 15. 
13 VT or 131 16. 
81 17. 
85. (реги for 85. 80) 18. 
5. Geredit for 4.7832) 19. 
10 20. 


20. 


PART ЇЇ: EXPONENTS 
EACH ITEM WEIGHTED 1/6 


w or 1024 


x2 


m 


| 
m? п? 
15 x ог (0° х 
10 


n^n? or 


x 


yj or y? 
ont xX 
х?у* or х2 vy 
or + either 


2a-2 


x or үд band) 


-3х (c-d) or -3xc + 3xd 
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SCORING KEY 
A BACKGROUND TEST FOR ELEMENTARY STATISTICS 
PART IIT 


PART 111: ALGEBRA 
EACH ITEM WEIGHTED 1 


8x + dy + 5 
sx? + зх +ху-у? – 6 
x? *bx*tsx^-5x* 8 


28 x) 

1653 у - 12 xy 

gx) - 22x? + 2х + 20 

m- m+n? 

(m ~ nXm + n) 

m* +am?-n а (а-1) NDS 


p ————— 
2(or |2 or 2!) 


T a(a—(a-2) Еа 
6 (ог [3 ог 31) 


2 
2, 4 


3 (or 1 2/3), -2 


3 
Bes 2, 
шэг 
-.36, огт 
30 
8.2 or v 67.2u 
ny 
-24/8p  -2+vy FBO 
— ог —————— 2022 ecl 


10 10 
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SCORING KEY 
A BACKGROUND TEST FOR ELEMENTARY STATISTICS 
PARTS IV, V 


PART Iv: TRIGONOMETRY 
EACH ITEM WEIGHTED 1/3 


E 
11255 
2 САУ 

2 
зз 

х ` 
uod 
Bal 
6. | | 


PART V: ANALYTIC GEOMETRY 
1-6. 


(each weighted 1/6) 


оЕЕ-ОФЮ 


7. Ist or linear (weight 1/3) 
8. Зогу = 3 (weight 2/3) 
9. 5 (weight 1) 


10. 3 (weight 1) 
MISSE от (weight 2) 


12. 4y+5x= -28 (weight 4) 
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SECTION 3. PERCENTILE NORMS AND RELI ABILITY 


Percentile equivalents are given in Table 
Appendix A for the Experience Score and also for 
the Total Background Test Score, based on 406 
students beginning work in statistics from Colum- 
bia University Teachers College, University of 
Illinois, Harvard University, University of 
Minnesota, University of Oregon, and Stanford 
University. Percentiles for the Parts separately 
are not provided, but the "equivalent" scores 
are such that a person’s Part I and II score 
will equal his Part III, IV, and V score if he 
is strictly typical or equally able in the sub- 
ject matter of these different parts. In this 
table, X, is the Background Test Score andX, the 
Experience Score. The percentiles are based upon 
406 students beginning elementary statistics in 
courses at (a) Columbia University (Teachers 
College, V = 52), (b) University of Illinois 
(N 239), (c) Harvard University (Graduate School 
of Education (М = 29), (d) University of Minne- 
sota (8588), (e) University of Oregon (N 7153), 
(f£) Stanford University (V = 45). 

The standard error of X, as a measure of 
whatever it is a measure is unknown because a 
second similar measure is not available nor can 
the elements entering into X, be divided into two 
comparable halves and reliability determined by 
the Spearman-Brown formula. The standard error 


M 
of X, when taken as evidence of X is 1797 
This is equivalent to a reliability coefficient 
of .80 in the group of 406, having a standard 
deviation of 4.40. The correlation between X, 
and X, for the 406 cases is .65. Certain small 
samples suggest that the correlations between a 
term grade in an elementary course in statistics 
and X, and X, scores, received four months 
earlier, are in the neighborhood of .4 and .6 
respectively. 


676 BACKGROUND TEST 


TABLE Appx. A. PERCENTILES 


X 


b 


TEST SCORE 


Р, 

Iss Pees 30 33 
Ру Ро 31 34 
Р Pas 32 34 
Ро Р ву 34 35 
Pos Pas 37 36 
DES P3, 39 37 
рз», D. 42 38 
P s 45 40 
Pius Pas Ч 


LU 


APPENDIX B 


REFERENCE LISTS 
SECTION 1. LIST OF COMMON STATISTICAL SYMBOLS 


Herein are given the more common meanings at- 
tached to symbols as used in this text, and also 
the more common meanings as used in statistical 
texts in general. 

Throughout, as used in this text, a bar over 
a symbol, e.g., V, indicates an unbiased estimate 
or a mean value, and a tilde, e.g., V, a popula- 
tion, hypothetical, or true value. 


THE GREEK ALPHABET 


А а Alpha T. uota P р Rho 

B В Beta К к Карра X c Sigma 
Г y батта А À Lambda Т т Та 

А 5 Delta ^ M ш Mu Т v Upsilon 
E Е Epsilon N v Nu $ 9 Phi 

Zu ci Zeta BOUES XI хи 
Шо Eta О о Omicron + v Psi 

@ 0 Theta ЕО О w Omega 
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LIST OF COMMON STATISTICAL SYMBOLS — LITERAL SYMBOLS 
a, A, and a 
a,- the constant termina regression equation. 


а,- a and b are employed in connection with 
boundary values of a sequential analysis 
likelihood ratio. 


а,,.- an element in a matrix at the inter- 
sectionofthe i-th rowand the j-th column, 
Ch. XIV, Sec. 2. [14:01] 


A,- Arbitrary Origin, as employed by Yule and 
Kendall. 


А; jee a cofactor. Ch. XIV, Sec. 2. [14:12] 


G,- common notation for a matrix. Ch. XIV, 
Sec. 2. [14:01] 


Q',- a matrix that is the transpose of б. 
[14:02] 


Qt,- а matrix that is the inverse of 0. £ 
[14:06] | 


а,- the proportion of cases in a cell in a 


2X2-fold. The other proportions are £, y, 
and 8, 1 


а, - in Pearson's Tables, the area between mean 
and point of dichotomy in a unit normal 
distribution = a/2. 


ED 


a,- a measure of risk in sequential analysis. 


b, 3, and В 
b,- a regression coefficient. 
b,- see a (sequential analysis). 
b,»- an abridged notation Teh гиз 


A regression coefficient ina multiple re- 
Bressionequation. 


А, - a standard score regression coefficient. 


B,- a measure of risk in sequential analvsis. 


-5 
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В;,- an abridged notation forbo; 32,..)1 6e.. k" 


A regression coefficient in a multiple 
standard score regression equation. 


8(р,9),- The Beta Function. See Table XV А. 
1934-1938 Biom. 


В (р,чд),- The incomplete Beta Function. See 
Table XV A. 1934-1938 Biom. 
See І,(р,9). 

B, = n/a: В, = а/ы; Ву = has мо; Pus из 

epo Шз; 8, = ug/ DÀ in which the ив аге 
moments from the mean, and м, =У as used 
in this text. 


These 8’ s are used especially in connec- 
tion with the Pearson system of curves. 


C (for y and Г see "4"; for X see end of al- 
phabet) 


C,- as a preceding subscript indicates a cor- 
rection for coarseness of grouping. 


C,;,- the covariance, or product moment, be- 
tween the i and j variables. 

€.3» C22» ©_у, Со» Су, Су» Сз, Суут Lagrangian 
interpolation coefficients. 

C,- a contingency coefficient: С? =ф?/(1+ф?) 
ЛЕХ.) 

Сур а cofactor. 

Cii, is used in this text to mean ZX, X. МУ. 
Зее с; 


cm,- as а а NE а сог- 
rection for use of class means. 

C(n,m), or „С. ог Сү. or (5),- is the number 
of combinations of n things m at a time. 


С(п,т) = п! / [ш!(п-ш)!]. See P(n,m). 
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4, 0, 8, апа А 


d,- a deviation. 


d,- a difference in scores, in ranks, etc. 


d,- (and also 4,,) the mean deviation of a por- 
tionofa unit normal distribution. [8:26] | 


апа [8:27] 


ї 
di;,- a difference between standard scores 2 | 


and z;. [11:28] | 
d.,- in actuarial statistics is the number 
dying during the x year of life. 
d.o.f.,- the number of degrees of freedom. 


D,- a difference in rank when computing the 
correlation between ranks. 


8,- See first definitionofa. $ is value ta- 
bled in Pearson's Tables for different 
tetrachoric coefficients. 


-- 


8 and A frequently indicate small increments. | 
As they approach infinitesimals they ap- | 
proach the differentials of calculus. | 


ô; jı- Kronecker's 5;; is such a quantity that 
1% = ] when the two subscripts are the same 
and otherwise it 50, If the a's in the 


matrix 
cir elu He Ti ac ORE 
Е 40068195, 
По кж мар, 


are such that the sumoftheir squares for 
any row, or column, 51, and the sum of the 
Products for corresponding members for any 
two columns, = 0, then 
* = Wk 
aC 28, аг ау, 


е, 
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A as a subscript means "dependent upon." It 
is the opposite of the dot as a subscript, 
which means "independent of." It is em- 
ployed herein in connection with scores, 
correlation coefficients, regression coef- 
ficients, variances, covariances, etc. 

A,- a major determinant. Л;; is the minor de- 
terminant after row 1 and column j have 
been deleted from ^. Ch. XIV, Sec. 2, 
[14:16]. 

A,- with superscripts isused to indicate dif- 
ferences in tabled values. Ch. XIII, Sec. 
12. Table XIII I. 


E, and Е 


e,- the Naperian base of logarithms. It = 


2.71828 18284 59045, etc. 


exp,- in an equation, the expression following 
"exp" is the exponent of e. 


E,- the expected value, the expectancy, ог the 
mean value. 


€,- a small quantity. 


€,- an unbiased correlation ratio. 


Е, p, and Ф 
f,- a frequency, or number of cases. 


f,- as a preceding subscript indicates a cor- 
rection for fineness of grouping. 
c 


f(x),- a function of x. 


f'(x),- the first derivative of f(x),- thus, 
if y = f(x), then dy/dx = f'(x); f",- the 
second derivative, etc. 


EVE үгс Ch. DX, See. ^5, 95291, 


= the frequency in the cell at the inter- 
section of the i row and the j column. 


A, 
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F,- when computing percentiles, F =the sum of 
f's up to a certain point. 

F; .- a variance ratio having i d.o.f. in the 
numerator and j d.o.f. in the denominator. 
As herein used, the denominator variance 
is the error variance. Ап alternative and 
common usage is so to write Fi; thatit> 1. 

p,- a common designation for an angle. 

p,- ф(х) a function of x. 


$,- Yule's p is a product-moment correlation 
coefficient in a 2X2- fold. 


pf,- is themean square contingency. It -12/1, 
This $? = Yule's Фф squared in case of a 
2X2- fold only. 


G, y, end Г 


8,- as used by Fisher, g; = X,/k?^ and 4, = 
к/к. These are sample estimates of his 
population parameter, y, and У,. 

G(r,v)integral. See Ch. XV, Sec. 1. Pearson, 
(1914). 

Y,- Fisher's y, = 43/u}", and y, -(q4/u2)-3. 
As used by Fisher, Greek letters indicate 
population, not sample values. 


Г, - the gamma function, - а concept of factorial 


extended to numbers other than integers. 


See T E 


H, and 7) 
h,- see К, 


ftc Т and h, are intercepts in sequential 
analysis. 


H,- hypothesis. 4 with following notation 
commonly means under the hypothesis indi- 
cated by the following notation. 


1, 
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7,- a correlation ratio. See є. 


I, and + 

i,- the imaginary quantity V-l. This univer- 
sal use of i has not been called for in 
this text. 


i,- the size of the grouping interval. 
1 or r,- an interest rate. 


i and j as subscripts of V and in other con- 
nections, are used herein to indicate the 
numbers of d.o.f. herein i’ andj’, аз sub- 
scripts of V, indicate the variancesof the 
i and j variables, —the prime being used 
to avoid conflict with the use of i and / 
to indicate d.o. f. 


i as a subscript of a variable, or connected 
with a summation sign, stands for any of 
a number of veriables, usually from 1 to 
k. j has a similar meaning for a second 
series of variables, usually from а to 4. 


I,- the area from the mean to the point x in 
the unit normal distribution. of Table 
XV C = a/2 of Sheppard’s table, given in 
Pearson's Tables. 


I,- the identity matrix: 44^! = I, Ch. XIV, 
Sec. 2, [14:05]. 

[cro of Pearson's Tables of the Incomplete 
Затта Function = D (р+1) / Г(р+1). 


Ї,(в,с) of Pearson's Tables of the Incomplete 
Beta Function = 3.(p,a) / B(p,a). 


j and / 


J,- see i. 


J, CO, = a common designation ofa Fessel func- 
tion of the first kind. 
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k, 


K, and « 

è CES . . -AN o 2 
k,- is а coefficientofalienation. It 7 /]-r*. 
k,- is the number of classes in a categorical 

series. Also the specific designation of 


the k-th class. 4 is used with a similar 
meaning if a second series is involved. 


Ki, ky, К.» К.» asused by Fisher, are unbiased 
sample estimates of cumulants Kis Koy ку, 
and «,. 


K,- Fisher's cumulants are population statis- 
A ~ 


tics. ку =4; КИ: к =Й, к, + 3x; = Dy 
in which the tilde measures are as used in 
this text. 

L, A, and A 


а direction cosine, usually in n-dimen- 
sional space. 


4,- see the second definition of k, 


4,;- in actuarial Statistics is the number of 
persons living at age x. 


L,- a likelihood ratio, 
log,- also loé,,,- logarithm to the base 10. 
In,- also 108,,- logarithm to the base e. 


À,- a direction cosine, usually in n-dimen- 
sional space. 


^,- a Lagrange multiplier. [14:118] 


A,- in the expression "A-shaped™ indicates а 
unimodal distribution. 


И, and u 

H,- the arithmetic теап, 

H,- а moment from the mean: пу = (5х/№) =0); 
Ha HUC S Vos oues Еау, ше 
(Ух*/Ү); etc. 
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L',- a moment from zero: и, =2X/V; u; -53X?/N; 
№ -XY^/N; ш, =УХ /N; etc. : 


п, №, and v 


р, 


п, - d.o.f., i.e., n is synonymous with i as 
used herein. 


n,- a common designationofthe last item in a 
Series,- i.e., n is synonymous with К as 
used herein. 


n,- a common designation of the exponent of a 
binomial, or polynomial. 


n,- number of cases in a sample (though № is 
more frequently so used). 


n,- average sample number. 
N,- the number of cases in a sample. 


v,- Greek nu, sometimes used with the meaning 
of р’ preceding. 


| о апа О 


о,- as a subscript in connection with index 
numbers, it commonly refers to the basal 
date. 


о,- as a subscript in Ч, indicates the null 
hypothesis. 


о,- a common designation of the criterion vari- 


able. 

OC,- operating characteristic curve. 

P, п, and il 

P,- a probability, i.e., a proportion of cases. 
If but two classes, 415 the other propor- 
tion, so that p *q-]. ТЕК classes, р, + 
Dp, t... tp, = 1. Some writers employ 
ptqgtr-ts-], inlieuof D,*P,tp,tD, EI 


р,- a price index. 
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P,- in interpolation, the proportionate dis- 
tance of the valueinauestion from the ta- 
bled argument next smaller to the distance 
between the two neighboring arguments. 

P,- in a dichotomized unit normal distribution, 
P, аз herein used, is the proportion to the 
left and а that to the right of the point 
of dichotomy. As tabledin this text, p>q. 

P;j.- a product-moment, usually involving de- 
viations from means. Frequently p,, is 
defined as is Сү, herein, Another usage 
defines Рүүүур аз = 5 xixixkx!/¥, in which 
the exponents may take any positive inte- 
gral values, including that of zero. Karl 
Pearson has used p,.,, to mean Ру; кг, аз 

: ijkl тук 
here defined. 

р;;,- а price ratio, — Ве price at date i di- 
vided by the price at date j. See 91: 


P~ in actuarial statistics, the probability 
of a person of age x living one year. 


Р,- а probability, usually of a divergence as 
great in absolute value as that observed. 
With one d.o.f. and a normal distribution, 
Р = 29, as given in the first definition 
under p. 

P,- a price index. 

P,j,- a product moment. Sometimes P,; is de- 


fined as Су; herein. As herein used, P, jki 

= EX XSXEXI/N, Karl Pearson uses P; jki to 
mean P;;,,,, as here defined. 

P(n,m), or Р, ог Pr,—is the number of per- 

mutations of n things m at a time. P(n,m) 


= n!/(n-n)!, See Cnm. 


9, 


г, 
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P,- a percentile,—P, is the (1000) percentile. 


mT,- the ratio of circumference of circle to 
diameter. It = 3.14159 26535 89793 etc. 


П,- as a symbol of operation means the product 
of all those magnitudes immediately fol- 
lowing, thus Ij X, =X, X X, X... X X,. 

2 , 

g,- see first and third definitions of p. 

Чур,т а quantity ratio, —the quantity at date 
i divided by the quantity at date J. See 
Pans 

Gris the area in a unit normal distribution 
between x, and x; 

g,- inactuarial statistics is the probability 
of a person of age x dying within one year. 
а» = d/4.. 

О, - a quantity index. 

Q,- a quotient. 

О, - а Lexian ratio, which, when i = d.o.f., = 
Х?/1 = the variance ratio Е. 

R, and р 

r,- а product-moment correlation coefficient. 


r,- with preceding and following subscripts, 
other than those designating variables, a 
correlation coefficient with various cor- 
rections and under special conditions. 

r,- а ratio. 


г, ог i,- an interest rate. 

r,,,- may be called a total correlation coef- 
ficient to distinguish it from partial cor- 
relation coefficients. 

I2,3» Tio;54» fio.345: etC., arepartial cor- 
relation coefficients. 
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TiAo3! Тудозц, etc., аз herein used, designate 


multiple correlation coefficients. See Ch. 
XI, Sec. 3. 


T3,23* l'j,23«» etc., have been used to desig- 


nate multiple correlation coefficients. 
Such usage is avoided in this text for 
reasons given in Ch. XI, Sec. 3. 


T1(231* Тїүгэнээ etc.,- ап alternative nota- 


tion to Ti^23» Гуләзц, etc. 


Ку |03), 1,554)» etc., are an alternative no- 


р, - 


tation for Гүлрэ» Г Дозц» etc. 


designates the correlation coefficient, 
based upon the squares of differences in 
rank. 


з, S, c, and X 


S,- 


R. A. Fisher uses s to indicate an unbiased 
estimate of o, which, in his usage, is a 


population value. Thus s^ = W/(N-1) X V 
as used herein. 


as a preceding subscript to r, or R, in- 
dicates a correction for shrinkage. 


the indi fferent value of a proportion, or 
of a mean, which is also the slope of 
boundary lines in seauential analysis. 


as a symbol of operation, indicates a sum- 
mation. As used herein, X is a sumination 
with reference to М, the cases in the sam- 
Ple, and S a summation with reference to 
the number of classes, or the like. 


the standard deviation, which equals the 
square root of the variance. Аз used by 
R. A. Fisher, c is a population, not a sam- 
ple, value, 
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X,- as a symbol of operation it indicates a 
summation. See S. In this text X indicates 


a summation of М terms unless specially 
н(8-17 , . 
noted, thus = isa summation of (1-1) 


k . H 
terms, and iz is a summation as i takes 
=1 


all values from 1 to К. This is also com- 
monly written 5^. In this text the still 
more abridged notation Z* is sometimes used. 
Great care is necessary with double and 
triple summations. D E 4,,45,85183j 
is a summation of uv terms in which l and 
2 do not vary and i takes all values from 
1 to и simultaneously in A,, and a,,, and 
for each value of i all values of j from 
1 to v (including the case in which j =i) 
simultaneously in LT and ац,» If case 
i * j is to be excluded, it must be indi- 


cated, usually by nothtion at the right of 
the equation, j Ži. Another common situa- 


tion is when j takes all values less than 
i, in which case the notation j<i is re- 
corded., In this case a double summation 
x 2, j<i, would have u(u-1)/2 terms. 
Summations are sometimes indicated in math- 
ematical treatises (seldomin statistical) 
by the use of a dummy index, subscript or 


superscript. When thisindex appears twice 
in an expression, it represents a summa- 


tion, thus a;,; bj, ау; a), +a,;4,, +... 
+ а, зак, in which 1, 2, ... n areall the 
possible values that i can take. 
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Х,- the standard deviation. When two standard 
deviations, not readily definable by dif- 
ferent subscripts, are involved, c and X 
may be used. E.g., o designates a stand- 
ard deviation ina narrow range sample and 
2 designates it in a wide range sample. 

t, T, т, 0, and Ө 

t,- "Student's" t is a critical ratio,—a de- 
viationin termsofits standard deviation. 
The t-distribution is symmetrical, but non- 
normal, because, and only because, the 
number of cases in the sample is small. 
t? = F,;,—see F,;- 

t,- a tabled entry. Ch. XIII, Sec. 12, Table 


KLIVI, 
0,- а common designation for an angle. 
0,,- as used herein, б, is the function de- 


. H i . . 
pending оп i, the d.o.f., involved in nor- 


malizing a variance ratio. See Ch. IX, 
Sec. 5. [9:23] 


и, 0, v, and Т 


LJ 
u,- commonly used to indicate a tabled entry.— 
see "Ё", 
У and V 
V,- in connection with percentiles, v. is the 


value of the lower limit of the interval 
in which the (100p) percentile lies. 


V,- the variance: V, is the variance of the 
first variable, V, thatof the second, etc. 


For exception, see И; and yi 


Vi and Vis as here used in connection with 
variance ratios, V; indicates a variance 


vased upon 1 d.o.f. and V, one based upon 
J Чо. : 
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V; +> in connection with a normal distribution, 
is the variance of the portion of a unit 
normal distribution lving between X, and 
хр. See [8:29]. 

w and W 


х, 


w,- designates a weight, e.g., Уу» №, W, may 
designate the weights attaching to the 
first, second, and third variables. 


X, ё and 5, 
X,- a score as a deviation from the mean. 


x,- not infrequently a standard score, as in 


Table XV C. 


х,- not infrequently a raw score, i.e., СЕХ 
аз herein used. 


‚- а raw score. 
- the mean of the X's. 


Е,- a score as a deviation from an arbitrary 
origin. 


y and Y 


2, 


Same meanings for y, Y, and б, but pertaining 
to a second variable, as given for x, X, and & 
as pertaining to a first variable. 


Y (x),- a common designation of a Bessel func- 
tion of the second kind. 


Z, and б 

z,- not infrequently a standard score. 

2,- the ordinate ina unit normal distribution. 
. 


z,- В, А. Fisher's 2. See Ch. XV, Sec. 1, 
1938, Fisher and Yates. 


(,- see "y". 


$ and ® 


Ф,- defined under f. 
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X,- the square root of X°. X’, though fre- 


quently derivable from categorical data 
and frequencies in classes, is, with vary- 
ing degrees of closeness, the sum of a num- 
ber of independent squared deviates of а 
unit normal distribution. ({*/i) =F,,,— 
see Fi 


As herein used, i designates the number of 
degrees of freedom in 7j. 


w and 0 


w,- as а subscript is used in connection with 


corrections for attenuation to indicate a 
true score in the second variable. œ in- 
dicates the true score in the first vari- 


able. 
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SECTION 2. CERTAIN LESS COMMON NON-LITERAL SYMBOLS 
м ne! : 
=, =, or =, is approximately equal to. 
~- is asymptotically equal ко. Frequently, in 
statistical equations, the nearness to equal- 
ity increases as № increases. 


<=,- is equivalent to, in terms of the designated 
function relationship; thus if two achievement 
tests are equated on the basis of age norms, 
we might write X = 78 <= Y = 152, meaning that 
78 is the same age norm on test X as is 152 
on test Y. 


€,- varies as. 

>,- is greater than. 

<,- is less than. 

2,- is greater than, or in the limiting case, is 
equal to. 


<, - is less than, or іп the limiting case, equal 
to. 


|x|,- the absolute value of x, or the value re- 
gardless of sign. 


ХЕ ог lz,- factorial x. If used when x is not 
an integer, it can then represent Г(х+1), but 
if so used, a specific statement in the ac- 
companying text is necessary. 


е 
Гх,- зев under Greek letter gamma. 


A-B,- the product of А and B. Also, in dealing 
with vectors, this is the "dot product," or 
scalar product of А and В. 


|а, j |,- the determinant whose element in the ith 


row and jth column is ape 
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|| ap ||,- the matrix whose element in the ith row 
and jth column is а; |. 


кз] = f(b) - Ка) 


а 
|abc...| is the determinant |p 
c 


| авс... || is the matrix with the same elements as 
in the determinant |abc...|. 


= (а+Ь) 


œ,- as a subscript in connection with corrections 
for attenuation, is used to indicate a score 
in the first variable having no chance error. 
w is similarly used with reference to the 
second variable. 


>,- approaches, thus n-9 is read, "as n approaches 
infinity." 


)i(,- reversed parentheses in a sequence written 
1,2...)1(...,К asserts that all the values, 
l, 2, etc., to К, except the value i, are in- 
cluded. E.g., should i have the specific 
value 2, the sequence would be 1,3,4,...,К. 
Where no ambiguity results, the commas may be 
omitted and this last series written’134...k 


[],- not infrequently square brackets are used to 
designate the mean of whatever is within the 
brackets. E.g., for a sample of №, [X] ==X/ 
К =й. [x,x,] = (5x,x,)/N = ee Textual ex- 
planation i this use of []13 песеѕѕагу, for 
otherwise it may be interpreted as a symbol 
of aggregation. 
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SECTION 3. А FEM MATHEMATICAL TERMS (not elsewhere 
defined) COMMONLY HAVING STATISTICAL SIGNIFICANCE 


Affine,- an affine transformation is of the form 
у лажа Бх, + с 
Mu = Ах, + Вх, + С 

ав 


When АВ 


= (aB-Ab)# 0, this covers the 


fundamental types of general utility in sta- 
tistics: translations, rotations, stretchings 
and shrinkings, and reflections. The affine 
transformation carries parallel lines into 
parallel lines and finite points into finite 
points. The rotation is an isogonal affine 
transformation in that it does not change the 
size of angles. 


1 1 
Bernoullian numbers,- they are: В, T Bem 307 
5 69 | Ji 
ВЕ ЕВЕ. в 
О "sg 5 66777127807) 08481 
3617 а > 
8 = 510° йр, \д2°-12п аел: 


Combination,- the numberofpossible combinations 
of n different things mata timeis designated 
C(n,m), от „Cas or бт, ог (П) Сео) АТИ 


(п-т)! т! See Permutation. 


Factor,- a factor as usedinmental factor analy- 
sis, is a linear function of the scores upon 
a number of measures (usually mental tests) 
which, in relation to other factors, i.e., 
other linear functions ofthe same scores, has 
certain unique properties. Different con- 
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siderations lead to different factorial solu- 
tions. Involved are concepts of (a) indepen- 
dence, orthogonality andmaximal variance (Ho- 
telling, Kelley, et al.), (5) simple structure 
(Thurstone, et al.), (c) social importance 
(Kelley), and various hypotheses, general 
(Spearman, et al.), group (Thomson, et al.), 
bi-factor (Holzinger, etal.) etc., which qual- 
ify the conditions which are imposed. 


Geometric series,- one in which the ratio of each 
term to the following is constant. Let the geo- 
metric series be а + ar + ar^ *...* аг". The 


sum of these (091) tems is 242 = r"*1) 


In 


Inteéral function,- one which can be so written 
that the variable, or variables, have positi ve 
exponents and do not appear in the denominator 
of any term. 

Integration,- is the process of summing parts (the 
infinitesimal elements of area of the calculus) 
to obtain a whole. An integral is the result 
of an integration, A definite integral is the 
result of integration between definite, or 
assigned, values. 


Lexian ratio,- designated Q by Lexis = 12/4.0.4. 
Logarithm,- if given y = а”, then x is the loga- 
rithm of y to the base a. Two bases are in 
Common use,—the base e andthe base 10, yield- 


ing natural logarithms designated 4л, or log,, 
and ‘common logarithms designated log, or logi. 


With base e we have у = e* and the important 
property $% = ех holds, With base 10 we have 
У = 10* and the important property log (107у) 
= p *log y holds, thus log 172. =2 + log 1.72 
2.235528... in which 2 is called the charac- 
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teristic and .235528... the mantissa. p is a 
positive or negative integer. 


The relationship between natural (or Napierian) 
and common (or Briggs) logarithms: 


log x = .43429 44819 03252... dnx 
4n x = 2.30258 50929 94046 . . . log x 


Orthogonal Transformation,- one which transforms 
from one set of rectangular coordinates to a 
second set. E.g., simple rotations and trans- 
lations. 


Permutation,- the permutations of n different 
things mata time consistofall possible com- 
binations m at a time and for those in each 
combination all possible arrangements. The 
number of such is designated P(n,m), or ,P,, 
or P^. P(n,m) = n!/(n-m)! See combination. 


Quadrature,- the processof finding a rectangular 
area equal to that of a given surface. Usually 
one only of the bounds of this surface is 
curved. 

Ratio test for сопуегбепсеоЁ an infinite series,- 
if the absolute value of the ratio of the 
(n*l)th term to the nth term as nw is less 
than l, the series is convergent; if equal to 
1, there is no test; and if greater than 1, 
the series is divergent. 

Rational function,- one which can be written so 
that the variable, or variables, do not have 
fractional exponents, or equivalent radical 
form. 


Rational integral function,- one which can be 
so written as to be both rational and integral. 


Vector,- a line having both magnitude and direc- 
tion. 
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SECTION 4. KEY TO FORMULAS IN ORDER OF OCCURRENCE 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 
Re percentiles: 146 
P PV f, 
L, and F 
P [4:01] 146 
5 апа £ 149 
ор and q (4: 03] 149 
P. 
А generalized mean [6: 01] 201 
V [6: 03] 203 
[6:07] 20% 
V [6:04] 203 
: 16: 05] 203 
16:46) 217 
с [6:06] 203 
x 16:44) 217 
И 16:09) 204 
V and V [6:10] 205 
^ 
M [6: 11] 206 
x [6: 14] 208 
Ў [6: 15] “09 
18:151 209 
9y [6: 18] 209 
УС! M,) [6: 19] 210 


Moment from fixed point [6: 20] 210 


KEY TO FORMULAS 


KEY PHRASE OR SYMBOL 


My 


И 

ГА 

Fisher's notation: 
mi, т, т, m, 
Hi Ho Hy» and py 


Yule'and Kendall's 
notation: А and & 


Күу- Fisher 


FORMULA NUMBER 


[6:21] 
[6: 22] 


[6:23] 


212 
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KEY PHRASE OR SYMBOL 


ky Fisher 
k.,- Fisher 
ks- Fisher 


K,,- Fisher 
Kas- Fisher 
K>,- Fisher 


Күут Fisher 


i,- interval 

Arb. Or. 

X,- a deviate from И 
Hy 

My 


Y,- Sheppard's 
correction 

s^4»7 Sheppard's 
correction 


5, 
5, 
"s 
3, 
И s= cf. [13: 147] 
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FORMULA NUMBER 
16:35) 
[6:36] 
[6:37] 
[6:38] 
[6:39] 
[6: 40] 
(6: 41) 
[6:42] 


[6:43] 
[6: 45] 
[6: 47] 
[6:48] 
[6: 49] 


[6:50] 


[6:51] 
[6:52] 
[6:53] 
[6:54] 


[6:55] 
[6:56] 
[6:57] 
[6:58] 


224 


KEY TO FORMULAS 701 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 
12 [ 6:59] 224 
[13:77] 524 
[13:79] 525 
[13:81 525 
о, 6:60] 224 
p [ 6:61 224 
2 [ 6:62 225 
y [ 6:63] 225 
" 5 
: 6: 64 225 
A.D.,- average deviation [ 6:65] 227 
[ 6:68] 229 
A.D. from P [ 5:66] 228 
6: 67] 228 
h,- number of measures 228 
below P 
о (AD. from Р) [ 6:69 229 
с (А.0.) [ 6:70] 230 
а (Рр, ‚ ),- covariance [ 6:71] 231 
between percentiles 
уб р) [ 6:72] 231 
Р,, - percentile measure [ 6:73] 231 
of variability 
V [ 6:74] 231 
V 
U,- quartile deviation — [ 6:75] 232 
и : [ 8:75] 232 
Re Y, <, i, and Arb. Or. [ 7:20] 260 
йо, [ 7:21) 260 
f,- ordinate in best [ 7:23] 261 


fit. parabola > 


702 REFERENCE LISTS 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 
Yo, (7: 24] 261 
И.) [7: 27] 262 
у. [7:28] 262 
V [7:29] 265 
Geometric mean [7:30] 265 
Harmonic mean [7:31] 272 
Differential equation [8:01] 288 
of normal curve 
Z,- unit normal [8:02] 288 
distribution [10:19] 348 
D,- in a normal [8:03] 288 
distribution 
Y,- normal distribution [8:04] 289 
А [8: 34] 301 
Шуут normal [8:06] 290 
A generalized mean [7:01] 23% 
Wan [7:02] 240 
B,,- Pearson [7:03] - Quy 
Equation of distribution [7:05] 245 
for which qu = Oan 
Ku,- percentile measure [7:08] 246 
of kurtosis 
И [7:09] 246 
Mesokurtic Ки 17:111 246 
Bır- Pearson 17: 12| 249 
бүу- Fisher [7:13] 249 
|А [2:14] 249 


SA 


KEY TO FORMULAS 193 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 
А [7: 15] 250 
А.,- percentile measure [7:16] 250 
of asymmetry 
и [7:17] 250 
Sk,- percentile measure 17:18 250 
of skewness 
Ve, [7:19] 251 
Mean deviation,- normal [8:07] 290 
[8: 20] 292 
ро = Y,- normal [8:08] 290 
Mean !x^|,- normal [8:09] 291 
Цул normal (8: 10] 291 
Mean [x^|,- normal [8:11] 291 
Has normal [8: 12] 291 
Her- normal [8:13] 291 
Hg ,- normal [8:14] 291 
Изо» - normal [8: 15] 291 
В,,- normal (Pearson) [8: 16] 291 
В,,- normal (Pearson) [8:17] 29 | 
Уүул normal (Fisher) [8: 18] 291 
Yos normal (Fisher) [8: 19] 291 
Q,- normal quartile [8:21] 292 
deviation 
Pv,- normal [8: 22] 292 
х (or z),- a standard [8: 23] 293 


score 


704 REFERENCE LISTS 


KEY PHRASE OR SYMBOL 
Р.Е. 

г 
у',- normal 


d,- mean deviation of 
normal tail portion 


d,,,- mean deviation of 
normal portion 


зат] 
Иш) 
iy,- normal 


mean Д2 


Ai (огу) 


F,j,- variance ratio 
Fig,- a function of 12 


Mean class frequency 


Variance of class 
frequency 


t= class frequency 
Hys - class frequency 
8,,- class frequency 


Д,,- class frequency 


FORMULA NUMBER 


[8:24 ] 
[8:2ца] 


[8:25 ] 
[8:26 ] 


[8:27 ] 


[8:28 ] 
[8:29 ] 


[8:284] 
[8:285] 
[8:31 ] 
[8:33 ] 
[8:35 ] 
[8:36 ] 


[8:37 ] 
[8:38 ] 
[9: 21 ] 
[8:39 ] 
9:01 ] 
9:02 ] 


[9:03 ] 


19:04 ] 
[9:05 ] 


[9:06 ] 


296 


297 


298 
298 


KEY TO FORMULAS 705 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 

X? and contingency [9:07] to [9:11] 318 
table notation,- 
ах, [9:16] to [9:20] 325 
о d. 

Linear restriction [9:12] to [9:13] 322 

Е; j,- variance ratio [9:21] 326 

fij [9:22] 326 

Ө, [9: 23] 326 

d normalizing [9:24] 327 
transformation 

X normalizing [9: 25] 327 
transformation 

Band Р, [9:26] 329 

Additive property [9:27] 331 
of 1218 

Х, (10:01 1 333 

[10:0 [а] 333 

a) : . [10:02 ] 333 

by, [10:03 ] 333 

X, and X, (See [10:04 1 333 
also [11:97]) [10:042] 333 

Z and Z} [10:05 ] 333 


[10:052] 333 


706 REFERENCE LISTS 
KEY PHRASE OR SYMBOL FORMULA NUMBER P AGE 


Preceding subscript Table X A 33% 
notation indicating 
corrections in cor- 
relation coefficients: 


ар» а, Cyr Cy, C, 


с) сі, 5 
X,,- quadric regression (10:06) 335 
r [10:07] 339 
Comparable measures [10:08] 340 
Өү [10:09] 3%2 
[10:24] 350 
. [10:49] 364 
Y,- bivariate normal [10: 10] 343 
distribution [10:21] 348 
A subscript notation 344 
(See also Chapter XI, 
Sec. 3, pp. 435-436) 
У,,- analysis of variance [10:11] 345 
(See also [11:99] ) [10:16] 3u6 
[10:17] 347 
Үүл 110: 12] 346 
[10: 18] 347 
LN [10: 15] 346 
X) [10:23] 3u9 
Or [10:09] 342 
[10:24] 350 


[10:49] 364 


KEY ТО FORMULAS 


KEY PHRASE OR SYMBOL 


b,, and b, 


12 


C,,,- covariance 


12! 
d.o.f. equation 


ZX = a linear 
restriction 


=X, X- a linear 
restriction 
Null hypotheses 
re M, and Бо, 


Е - re the mean 


1,N-2? 


Fy y-or~ ге the 


FORMULA NUMBER 


[10: 25] 
[10:26] 
[10:30] 
[10:31] 


[10:27] 
[10: 29] 


[10:28] 
[10: 32] 
[10: 33] 


[10: 34] 


[10: 36] 
[10:37] 


[10:38] 


[10:39] 


regression coefficient 


V, 


м’ 


V(5,) 


- correlated data 


Е. (н_о) re hypothesis 
г= 0 


2,- Fisher's г into 2 
e 

(z-z) critical ratio 
д, ,- cf. [13:145] 


[10:40] 
[10:41] 
[13: 143] 
[10:42] 


[10:43] 
[10:43а] 


[10:44] 


[10:45] 


[10:46] 
[10:462] 


707 
PAGE 


350 
350 
352 
352 
351 
352 


351 


708 REFERENCE LISTS 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 


22 adjustment for [10:48] 362 
coarseness of grouping 
V(X,),- variance of [10:50] 364 


а point оп the re- 
gression line 


Difference formula for r [10:52] 365 
Sum formula for r [10:53] 366 
Sum and difference formula [10:54] 366 
Mean of М ranks [10:55] 366 
V of N ranks [10:57] 366 
P,- rank г [10:58] 367 
r from р [10:59] 367 
e; [10: 60] 367 
Оүу? f from p [10:61] 367 
г from Pearson's р [10: 62] 368 
Horn's correction for [10:63] 368 
tied ranks 
Applying Horn's correction [10:64] 369 
Ч, +- dichotomous series [10:65] 371 
Vibe dichotomous series [10: 66] 371 
Сү»? У is dichotomous [10: 67] 371 
and x continuous 
b [10: 68] 372 
5, [10:69] 372 
г, у" biserial product [10:70] 372 


тотепї г 


KEY TO FORMULAS 


KEY PHRASE OR SYVBOL 


| ><] 


1,ц-2 for the difference 

etween means of the 
upper and lower classes 
of the dichotomy 


bye ‚- У’ assumed 
continuous 
Tı,- biserial г 


х 


у! 
V. 
y 


.х 


У (biserial г) 


ООЛУ бур, а, P's Q' 
in 2X2-fold defined 
Chae 2X2-fold 


P,- product moment г 
in 2X2-fold 


hy 

X mee 

E cz] Хо. test (р-р) 
Prts) to test (b, -5,) 
у 


T, ,- tetrachoric г 


FORMULA NUMBER 
[10:7 1] 

[10:72] 

[10: 73] 


[10:74] 


[10:75] 
[10: 76] 
[10:77] 


[10:78] 
[10:79] 


[10:80] 


[10: 81] 


[10: 82] 
[10:83] 
[10: 84] 
[10: 85] 
[10: 86] 


[10:87] 


709 


PAGE 
372 
372 
372 


374 


374 
374 
375 


375 
375 


380 


381 


381 
381 
381 


381 
382 


383 


710 REFERENCE LISTS 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 
Vir.) [10:88 | 386 
г, ,- when dichotomic [10:89 ] 386 
lines are at the [10: 89а] 387 
medians 
У (г. ) [10:90 | 387 
Ty „ эт T between class [10:91 ] 393 
means and continuous 
variate 
WV [10:92 ] 393 
c n 
[10:93 ] 393 
пуп [10:94 ] 393 
ic [10:95 ] 393 
[3 лт 
X, +> а weighted sum 110: 96а] 395 
10: 96b] 395 
Е [10:97 | 395 
Cap 10:98 ] 395 
27 [10:99 | 395 
[10: 100] 396 
[10: 101] 396 
Үүд» Jj» Ep average [10: 100] 396 
intercorrelations 
aj [10: 102] 397 
[10: 105] 397 
fes [10: 103] 397 
V, ,- variance of [10: 106] 398 


a difference 


KEY TO FORMULAS 


KEY PHRASE OR SYMBOL FORMULA NUMBER 

Defining an obtained [n:01 J 
measure Хү, а true [11:012] 
measure X,, and an [11:02 ] 
error in the obtained 
measure е, 

Defining half scores X; [11:03 1 


(or 22) алд х, 


2 


Defining second variable [11:016] 
X,, half scores X, and 


X, and true scores x, 


Defining a criterion [11:04 J 
variable х,, and a true 
criterion variable X, 


V(x, -Xj) [11:05 ] 
9,.,:- Rulon's standard [11:06 ] 
error of estimate 
formula 


г, ,- Kuder-Richardson reli- [11:07 1 
ability coefficient 


n = г,, [11:08 ] 

T,- Ѕреагтап-8 гомп [11:09 1 
formula 

г. [11:092] 

n [11:095] 

y [1 1:09c] 

г, ут Зреагтап-Вгомп 11:10 1 


formula 


711 


PAGE 


401 
402 
402 


402 


403 


403 


403 


404 


405 


406 


406 


406 
406 


407 


712 REFERENCE LISTS 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 
a [1:11] 407 
т 

23 [11:12] 407 
n 

У (sum) [11:13] 407 
У ,- variance error [ris du] 407 


of estimate 


M 111:151 408 
о 
y, [11:16] 408 
(eb [1:17] 409 
2) THIS 409 
х, [11:19] 409 
X. [11:20] 409 
ys [11:21] 410 
У.у,- variance error of [11:22] 410 
estimate when X, is 
regressed 
CCS [11:23] 412 
Toji- correction Тог attenu- [11:24] 412 
ation in one variable 
To. correction for [11:25] 412 
attenuation [13:85] 527 
хи [11:26] 412 


Ф,1 [11:27] 413 


KEY TO FORMULAS 7323 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 
diz = 2; = 2, (208 are [11:28 | ИП 
standard scores) 
V(d,,) [11:28а] un 
c (d d. үү) [11:29 J 4 l4 
г(Ч,,4, тт) [11: 29а] 415 
Чур „у [11:30 1 415 
(Чу) [11:302] 415 
V (d,) [11:31 1 415 
[11:3!а] 415 
d [11:32 1 415 
V(d,,) [11:322] 415 
а а УУ (11:33 1 416 
И [11:33а] 416 
в [11:34 1 416 
V(d. 2) [1 1: 34a] 416 
SPA [11: 34b] 417 
4,, and d= [11:35 1 417 
squared, critical ratios [11:36 1 417 
Reliability of d [11:37 | 419 
r, (triad) [11:38 1 420 
V(r, ) [11:39 ] 420 
г. [11:40 | TX 


714 REFERENCE LISTS 


KEY PHRASE OR SYMBOL 


525 


V(t 234) 


Азу» 


Best estimate of 2, 
Weighting to allow for 
reliability 


Effect of range upon 
reliability 


v/V. 
[^ (2) 


Effect of ап imposed 
v,/V, upon Гүр 


Effect of normal selective 
Processes upon г, 


X, та 3-variable problem 
Е 
b; and b, 


а,- the constant term 


Ky 1297 а 3-variable 
multiple alienation 
coefficient 


FORMULA NUMBER 


[11:41] 
[11:42 1 
[11:45 1 
[11:44 ] 


[11:45 ] 


[1 1: 46а] 


[11:47 1 
[11:48 1 
[11:49 1 
[11:50 ] 


[11:51] 
[11:52 1 


[11:53] 
[11:55 ] 


[11:56 ] 
[11:562] 


[11:57 1 
[1:58 1 


[11:59 ] 


[11:60 ] 
[11:62 ] 
[11:81 1 


PAGE 


434 


цу 


— даа Дий 


ир ee ee ee 


KEY TO FORMULAS 715 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 

Ё, and B,,- standard [11:61] 435 

score regression [11:62] 435 

coefficients [11:82] yu 

[11: 83] Vul 

V(z,),- analysis of LI i: 63] 435 

variance 

Лос [11: 64] 435 

[11:65] 436 

[11:94] 445 

Vo „12 : [11:66] +36 

У,,- analysis of variance [11:67] 437 

dso. f. [11:68] 437 

Fy n- to test (бло - 0) [11:69] 437 

V ,- analysis of variance [11:70] 438 
when rj = 0 

d. o. f. [11:71] 438 

F, y-3 to test (5, - 0) [11:72] 438 
when ^0 

F, ноу to test (0, В.) [11:73] 438 
when д, = 9 

К.“ 2 [11:74] 439 

[11:83] 441 

[11:95] 445 

Еа [11:75] 439 

а вс. [11:76] +39 

[12:20] 461 

х [11:77] 439 


716 REFERENCE LISTS 


KEY PHRASE OR SYMBOL 
ЦЭР to test 

(8,127 01.2) 
У (60.2) 


A,- major correlation 
determinant 


As 24 


ô,- а major determinant 
d,- a major determinant, 


D,- a major determinant 
Hos Wy, and M, 

У, И, and V; 

Torr Moo» and г, 


Regression constants 
а, by, and b, 


FORMULA NUMBER 


[11:78 1 


[11:79 | 


[11:80 ] 
[12:26 ] 


11:84 ] 
[11:85 ] 
[11:86 | 
[11: 87 1 
[11:88 ] 
[11:89 1 
[11:90 ] 


[11:91] 
[11:92 ] 


11:93 ] 


Хол, 12 quadric regression (11:96 | 


И ,- analysis of variance 
d. o. f. 


И ,- analysis of variance 


d. o. f. 


[13:03 ] 
[11:97 ] 
[13:02 | 
[tts 101] 
[11:102] 


[11:104] 


[11: 105] 


PAGE 


Ww 
471 


yyl 


KEY ТО FORMULAS 717 


KzY PHRASE OR SYMBOL FORMULA NUMBER PAGE 

Е, y-317 to test need of 111:107) 448 
second degree regression 

F, y-417 to test need of [11:108] 448 


third degree regression 


тоз .- correlation ratio [1 1: 109] 450 
Giving estimates of 111:110) 451 
Population variance to 
errors for different [11:1 13] 452 
parabolic estimates 
,F^,- giving corrections [1:114] 452 
to r^ for shrinkage for to 
different parabolic [11:116] 452 


regressions 


Е?,- Е an unbiased corre- — (11:117| 452 
lation ratio, cf. [13:10] 


и ay nek to test adequacy [11:118] 453 


of so, 12... .1J 


Үл [11:119] 453 
А (11:1201 ч53 
о. о [12:01 1 454 
Ато еп F^ [12:02 1 454 
2012... n [12:03 ] 454 
a,- the constant term [12:05 ] 455 
ue [12:06 ] 455 


718 КЕРЕКЕКСЕ 115Т5 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 
Крот [12:07] 455 
2 [12:10] 456 
Analysis of V(z,) [12:11] 456 
Condition equations [12:13 a to n] 456 
ToN12. л [12: 14] 457 
TOA123 [12:15] 460 
Соло and Су о: [12:16] 461 
[12:17] 461 

[12:18] 461 

B,- а matrix [12:21] 464 
F yeni to test (6-5) [12:22] 469 
[12:24] 469 

Fy y-n-1 to test (f, 1) [12:25] 471 
А = D,- a major [12:25] 471 
determinant [11:76] 439 

Kp. 12... [12:27] 471 
[12:45] 477 

ола. дуп [12: 28] 471 
[12:47] 477 

Ву, В», etc.,- multiple [12:29] 472 

regression coefficients 

ен [12:30] 472 
И) [12:30] 472 
V,,- 2 from partial г (12: 31] 472 
У(Б,),- cf. [12:47] and [12:34] 474 


[12:49] 


KEY TO FORMULAS 719 


KEY PHRASE OR NUMBER FORMULA NUMBER ? PAGE 


F - to test (5,- 5,) — [12:35] 474 


1,N-n-1* 


у correction [12: 36] 474 


for shrinkage 


поно 277 tORtesE [12:37] 475 
27729975) 
(,- augmented matrix [12:38] 476 
а!!,- inverse elements [12: 39] 476 
R,- predictor matrix [12:40] 476 
r)',- inverse elements 12:41] 477 
В,- regression coefficient [12:42] 477 
таїгїх 12:44) 477 
A,- correlation coefficient [12:43] 477 
matrix Р 
У(В,) cf. [12:34] [12:47] 477 
12:48] 478 
[12 48a] 478 
У(А FAI [12:50] 478 
Fy yen-ir~ to test (А, -В,) [12:5] 478 
m ^ 12:52] 478 
DD [12:53] 479 
Хо дл, 12,1397 cubic regression [13:04] 183 
yos 1127 analysis of variance [13:07] 488 


720 REFERENCE LISTS 


KEY PHRASE OR SYMBOL 


d. o. f. 


Рано" = to test 


У) 


Е? after removal of 
linear trend 


€? after removal of 


quadric trend 
Equivalent percentiles 
Equivalent standard scores 


Equivalent estimated true 
Standard scores 


Equivalent ratios 


vo 


< 


Wie 
— ,- optimal weight 
“ihe 


M,- most reliable Mf 

Күүт Pearson 

Kp,- Pearson 

ò,- Craig 

ЖЫ negative moment 

Type VIII, IX, XI criterion 


Type Y criterion 


FORMULA NUMBER 


[13:06] 


[13:08] 


[13: 10] 


[13:11] 


[13: 12] 
[13:13] 
[13:14] 


113: 15] 


[13: 16] 


113: 16а] 


[13:18] 


[13: 19] 
[13:20] 
[13:21] 
[13:22] 
[13:23] 
[13:24] 
[13:25] 


492 


492 


KEY ТО FORMULAS 721 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 
dy 
D Pearson curves [13:27] 512 
2 [13:27а] 512 
а„,- Carver [13:28] 513 
a, recursion formula [13:29] 513 
Giving а, b, b,, and b, [13:30] 514 
to 
[13:33] 514 
y,- unit rectangular [13:35] 515 
distribution 
y,- unit parabolic [13:36] 515 
distribution 
y,- unit straight line [13:37] 515 
distribution 
y,- unit exponential [13:38] 515 
distribution 
y,- Type XII [18:39] 516 
y,- Type XIII [13:01] 517 
Y,- Type Il [13:48] 518 
y,- Туре VII [13:53] 518 
Type VILI, IX, XI cubic in m [13:58] 519 
у,- Type VIII and IX [13:59] 519 
y,- Type XI [13: 60] 520 
у,- Туре У [13:67] 521 
у,- Туре 1 [13:73] 522 
у,- Type VI [13:74] 522 


у,- Type IV [13:75] 522 


722 REFERENCE LISTS 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 

V, ,- V of a derived [13:76] 523 

statistic 113: 78] 524 

[13: 80] 525 

[13:84] 526 

ү: [13:77] 524 

[13:79] 525 

[13:81] 525 

Tay ›- Г corrected for [13: 85] 527 

attenuation [11:25] 412 

[18:91] 529 

УС, (13: 88] 528 

у [13: 89] 529 

ЦЭР [13:90] 529 

1,- interval in a table [13:92] 539 

a's, t's, and A's in Table XIII | 540 

connection with tables 

P,- direct interpolation [13: 93] 539 

P',- inverse interpolation [13:94] 539 

А and 8,- inverse [13:95] 541 

interpolation [13:96] 541 

Ее ету [13:97] 541 
p р р E 

[13:98] 541 

113: 991 541 

Еж phy [13: 100] 542 

[13:101] 542 

[13:102] 542 

ат TED [13: 103] 543 


[13: 104] 543 


KEY PHRASE OR SYMBOL 
pi gut 
Y 

из (p) 

ш (9) 
133 
г(5 5) 
c(f 5) 
c(p,p, ) 
e£, 4) 
clf Ё) 
с( fa) 
ШАЎ 
ЦИ M,) 

c (m) 
ДАХ 
ШАА 
c(V V.) 


V(nj) 


(Фр) 
V (P) 


в (Р.Р) 


113: 
(13: 
[13: 
113: 
113: 
113: 
113: 
113: 
113: 
13: 
13: 
2116) 
2117] 


KEY TO FORMULAS 


FORMULA NUMBER 


105] 
106] 


107] 
108] 
109] 
110] 
111) 
1121 
113] 
114) 
1151 


: 118] 
:119] 
:1241 
: 125] 


: 126] 
: 148] 


: 127] 
:128] 


: 129] 


:130] 


724 REFERENCE LISTS 


KEY PHRASE OR SYMBOL FORMULA NUMBER PAGE 
Pap [13:131] 552 
r (00) [13:132] 552 
А [13: 133] 552 
г (0,0) [13: 134] 552 

[13: 135] 552 
r( jo) [13:136] 552 

[13: 137] 552 
r(n M) [13: 138] 552 
с(а5,,) 113: 139] 553 
r 61555) 113: 1401 553 
с(5 д3) [13: 141] 553 
ele 253) (13: 142] 553 
(4177) [13: 143] 553 

[10:41 ] 357 
|4 113: 1441 554 
V(ri,),- cf. [10:46] [13: 145] 554 
с: > 13: 146] 554 
V(V),- cf. [6:58] [13: 147] > 555 
c(V; V.) [13: 148] 555 

[13:126] 550 
У(с,,) 113: 1491 555 
c(V,c,,) (13: 150] 555 


с(У,с,-) [13: 151] 555 


KEY TO FORMULAS 


KEY PHRASE OR SYMBOL 
€ (c; ;с,;) 
с (С, 3634) 


h, and h,,- intercepts in 
sequential analysis 


G,- a matrix 


8” 


FORMULA NUMBER 


113: 1521 
113: 153] 


113: 1541 
(13: 155] 
113: 1731 
113: 174] 


113: 156] 
13: 172] 


13: 159] 
[13: 183] 


[13: 160] 
[13: 161] 


[13: 162] 
[13: 170] 


[13:163] 
[13: 171] 


(13: 165] 


[13: 169] 
(13: 180] 


[13:175] 


113: 176] 
[13: 177] 


13: 178] 
[13: 179] 
[13: 181] 


[14:01 ] 


[14:02 ] 


725 


РАСЕ 


555 
555 


559 
559 
568 
568 


559 
568 


560 
569 


561 
561 


562 
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3 [14:05] 575 
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А expanded [14:13] 579 
A,- factors of [14:4] 580 
Simultaneous linear 114: 15] 580 
equations 
Solutions Гик 17] 581 
Шуу Of point binomial (14: 18] 581 
Moments [14: 19] 581 
to 
[14:25] 582 
Poisson [14:26] 582 
Mome nts [14:27] 582 
to 
[14:36] 583 
Hypergeometric 14: 37] 584 
Mo * 
ments [14:38] 585 


KEY TO FORMULAS 


KEY PHRASE OR SYMBOL 


Г and factorials 


4n(X!) 
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Re numerical solution 
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Trigonometric functions 


Straight line 


FORMULA NUMBER 


(14: 44) 
to 
[14:47] 


[14:48] 
[14:49] 
[14:50] 


[14:65] 


[14: 70] 
[14:70 


[14:77] 
[14:78] 


[14:78] 
[14:79] 


[14:82] 


[14 83] 
114: 84] 


727 


PAGE 


586 


728 REFERENCE LISTS 


KEY PHRASE OR SYMBOL = FORMULA NUMBER PAGE 
d,- distance to line [14: 100] 604 
Re rotated variables [1:101] 604 
à to 

[14:109] 605 
tan (26) [1u: 110] 605 
Plane 118:111) 605 

[1:112] 606 
4, - distance to plane [14:1 13] 606 
6,- angle between lines [14:1 16) 606 
0, - angle between line [и 117] 606 

and plane 
À,- Lagrange multipliers [14:118] 607 
Y,- growth curves [1u: 120] 608 
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[14:124] 608 
Binomial expansion [14:125] 609 
Polynomial expansion [14:126] 609 
N! ,- Stirling [1%: 121) 608 
Ре... 114: 128] 609 
y,- Fourier [1%: 129] 610 
f (x),- Taylor-Maclaurin (19: 130] 610 
J £ (x) dx,- Euler-Maclaurin [1u: 131] 611 
$, jı Kronecker's 680 
Ву, B, ... Bernoullian 679 

numbers 


Sum of geometric series 696 
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Acceptance and rejection boun- 
dary lines, 568 


Acceptance region, 559 
Accuracy score on a test, 594 
Adjoint matrix, 979 


Advisory Committee of Social 
and Economic Research, 26-28 


Affine transformation, 691 

Agriculture statistics, 620 

Aitken, A. C., 457 

Aitoff equal area projection, 
166 


Alignment chart, 100, 616-611, 
628-632 


Amenability to algebraic manip- 
ulation, 247-248 


Analysis of covariance, 460- 
461 


Analysisofvariance, SEE Vari- 
ance e 


Analytic geometry, 30 
Anchor teet, 364-365 
Anti-logarithm, 35 


A posteriori probabilities, 
521-322 


Applied Mathematics Panel, 559 
Archibald, R. C., 626 
Argument, 48, 86 

Array, 341 


Association, 195, 198 
Asymmetry, 248-252 


Attenuation, 333-337, 412, 526- 
529, 611 


Augmented determinants and ma- 
trices, 471, 475-476 


AUTOMOBILE FACTS, 63-64 
Average deviation, 221-230 


Average intercorrelation be- 
tween series, 397 


Averages, 234-274 
Average Sample Number, 563 


Average Sample Number Curve, 
563 
Azimuthal projection, 165 


t 
Background of statistics, 24- 
56 


Background test, 25, 659-616 
Bar diagram, 100, 151 

Bartky, Walter, 559 

Bartlett, M. S., 25 


Basal year, or basal. date, 70, 
105, 109 


Bateman, Harry, 626 
Barlow, Peter, 612-613 
Bell, Julia, 614 
Bernbaum, 2. We, 559 
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Bernoullian numbers, 609-611, 
617-618, 695 


Bessel, 293 
Bessel functions, 613, 617, 626 
Best fit, 348-349 


Beta function (including in- 
complete beta function), 618, 
623-624 


Binomial case, 564 


Binomial distribution, 276, 
315-318, 581-582 


BIRTH CONTROL REVIEW, 164 


Biserial correlation, —SEE Cor- 
relation, biserial 


Biserial regression, 372-374 
Biserial product moment г, 372 
Block diagram, 100, 166-173 
Boundary lines, 568 


Boundary values, rule forallo- 
cating, 131 


Bowley, A. Ley Ч, 95 
Bravais, A., 339 

Bregman, Е. 0., 282 
Brigham, Car! C., 97 
Briggsian logarithms, 697 
Brown, Carl, 431 

Brown, William, 120, 405 
Burgess, James, 289 


Calculus, 28, 37 

Camp, B. H., 585, 616 
Caption, 86 

Carver, H. C., 507-508, 513 
Categories, 13, 151-152, 336 
Causal relationship, 345 


Cell frequencies, Correlations 
and regressions of, 545-548 


Cell square contingency, 302 
Center of population, 159-163 
134-274, 195 


Characteristic of a common 
logarithm, 696 


Charlier, C. V. L., 507 


Central tendency, 
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Chesire, Leone, 384 


Chi-square 112), 253, 284-286, 
302-303, 306-309, эв 325, 
507, 599, 613, 620, 623, 621 


Church, A. Е. R., 616 
Circle chart, 151 


Circular function, —SEE Trigo- 
nometríc functions 


Class boundaries, 129-140 


Class frequency, 73, 154, 
198, 311-325 


Class index, 122, 389-392, 394 
Class means, 392-395 
Classification of data, 80-81 


192, 


Classification of series, 81-82 


Coarseness of grouping, 333, 
388-395 
Coast and Geodetic Survey Maps, 
163 
Cochran, 4. G., 330-331, 599 
Colcord, C. G., 620 
Collection of data, 75-76 
Combinations and permutations, 
695, 691 
Comparable measures, 339-340, 
494, 499-501 
Percentiles method, 499-500 
standard score method, 500 
estimated true standard score 


method, 500-501 
ratio method, 501 


Competitive cases in 
9, 193, 424-425 
Comrie, 1. J., 615, 620, 623, 626 


Condition equations, 456-457, 
461, 471-475 


Conjugate diameter, 343 
` 


Conjugate elements inmatrices, 
and determinants, 573 


samples, 


Consequent, 48 

Consistent statistics, 230 
Contingency, 194 
Contingency table, 100 


Continuous series, 69, 73 


Continuum, 73, 193-194 


INDEX 


Contour lines, 163, 198, 343 


Convergence, ratio test for, 


697 
Correction to 7, 442-443 
Corrections to r 
for attenuation, 223-337, 412 


for coarseness of grouping, 
333-335, 393 

for fineness of grouping, 
333-335, "52 


for shrinkage, 333-335, ""9- 
453, 468, 474 


for enforced dichotomi zing, 


SEE Biserial r and tetra- 
choric r 
Correlation, 332-395 
Correlation, average, between 


series, 396-397 
Correlation, average rank, 396- 
391 
Correlation, 
310-519 
Correlation, correction to r,— 
SEE Corrections to r 


biserial, 197, 


Correlation, partial, 339 


Correlation, tetrachoric, 197, 
382-388, 613, 616, 617 


Correlation, two by two-fold, 
319-388 


Correlation, unlike signed 
pairs, 617 


Correlation between B regres- 
sion coefficients, +79 


Correlation between b regrese 
sion coefficients, 479 


Correlation between class fre- 
quencies, 198, 545-548 
o 
between M and V, 214 
between M and U3, 214 


between М? and V, 214 


between M and a cell fre- 


quency, 548 
between Mj and Mo, 548-549 


between V4 and V5, 549-555 


Tan 


between b and constant term 
in a regression equation, 
553 

between согу and г, 553, 555 

between r's, 553, 595 


Correlaťion between ranks, 365- 
310 


Correlation chart, 353-354, 142 
Correlation coefficient, 339 


Correlation coefficient, dif- 
ference formula for, 365 


Correlation coefficient, error 


in a, 358-362 


Correlation coefficient, 
dard error of, 360 


Correlation coefficient, sum 
and difference formula for, 
366 


Correlation coefficient, work 
formula for, 351-352 


Correlation, multiple, and re- 
gression, 433-442, 482-508 


Correlation of sums, 395-398 
448-453, 


stan- 


Correlation ratio, 
483-492 


Correlation ratio, multiple, 
453 
Correlation ratio, partial, 453 


Correlation ratio, unbiased, 
452-453, 483-492 


Correlation table, 340 
Cosens, C. R., 600 


Counter sample, 
magnitude, 77 


and counter 


Covariance, 350 


SEE also Product moment 
SEE also Correlation 


Covariance between percentiles, 
231 


Covariance of sums, 395 
Cowden, Dudley J., 457 
Cecil C., 507-511, 513 


Critical ratio, 232, 295, 319, 
359-360, 372, 504 
Cross-hatched plat, 173 


Craig, 
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Cross-hatching, 155 
Cross-section series, 68 
стот, ОИЕ 395 


Cube roots, 543-545, 613-613, 
652-656 


Cumulants, 211, 215-216 
SEE Table ХУ Е 


Cumulative frequency curve, 
141-149 


Curve fitting, 28, 37 
SEE also Frequency distribu- 
butions 


Curvilinear regression, 449-452 
Cycles, 70-71 


David, F. N., 358, 619 
Davis, H. T., 617 
Day, Edmund E., 95 


Decimal places to be retained, 
222-223 


Definite integral, 692 


Degrees of freedom, 205-206, 
303, 307, 308, 317-320, 
325-326, 329, 331, 35+, 
362, 438, 446, 447, 469, 
472-473, 475, 482-483, 507 


Delta, A, аз a subscript, 459- 
461 


Deming, 1. 5., 620 

DeMoivre, Abraham, 276, 277 
Derivatives, Tables of, 613 
Descriptive statistics, 21-23 
Design of experiments, 572 


Determinantal solution of si- 
multiple correlation rela- 
tionships, 441-446, 459, 471- 
475, 519-581 

Determinantal solution of si- 
multaneous equations, 580- 
581 

Determination coefficient, 504 

Determinants, 577-581 


cofactor, 578 
inversions in order, 578 


Determinants 


leading element, 578 
leading term, 578 
minor, 479, 578 
principal diagonal, 578 
sign factor, 578 
positive definite, 579 
Grammian, 579 

rank of, 579 


Deviation from an arbitrary 
origin scores, 216-217 


Deviation from themean scores, 
216-217 


Dickson, Hamilton, 343, 348 


Difference between means, 209, 
471 


Difference between means of 
tail portion of a normal dis- 
tribution, 300-301 


Difference between percentiles, 
230-232 


Differences between f regres- 
Sion coefficients, 478, u80- 
481 


Differences between fallible 
measures, 413-419 


Differences in a table, 35-36, 
539-541, 599-503 
Digamma function (including 


tri- and multi-gamma func- 
tions), 61u, 618 


Discrete series, 59, 73 
Distributions 


forms of, 238-240, 506-522 
exponential, 512-515 
Mendelian, 512, 515, 518 
normal, 512. 

parabolic, 512-515 
Pearson types, 506-522 
rectangular; 512, 215 


Туре 1, 522 

Туре 11, 511, 518 

Туре 111, 511, 517, 589 
Type IV, 522 

Type V, 511, 512, 521, 522 
Type VII, 511, 518 

Type VIII, 511, 518-520 
Type !X, 511-512, 518-520 


Type Xl, 511-512, 518-521 
Туре XII, 511, 516, 517 
Two-category, 511, 515-516 


m 
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Dominant position, 86, 88, 92, 
95 

Doolittle, M. H., 457, 459 

Doolittle solution, modified, 


458-471 
Dot, . , as a subscript, 459, 
460 


Double dichotomies, 570 
Duffel, J. H., 614 
Duncan, Acheson J., Ч 
Dunlap, Маск W., 616 
Dwight, Н. 8., 619 


Dwyer, P.. S., 457 


Efficient statistics, 230 
Elderton, E. M., 616 


Elderton, W. P., 220, 240, 822, 
507, 522, 613 


Equivalent, 39, 50 


Error, propogation of, 23-24, 
41-42 


Error, proportionate, 42 
Error, systematic, 41 


Errorof estimate, 349-350, 40l- 
402, 403, 455 

Error of observation, 400 

Error of prediction, 15-17, 24, 
349-350 

Estimation, 332-395 


Euler-Maclaurin evaluation of 
definite integral, 610-611 


Euler polynomials and numbers, 
618 2 


Everett, P. F., 613 
Exponential, 33» 621, 62%, 627 


Exponential equations, solu- 
tion of, 589 


Extrapolation, 34 
Ezekiel, Mordecai, 458 


F, SEE Variance ratio 
Factor analysis, 422, 695-696 


Factorials, 585-587, 609, 614, 
621 
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Fallible measure, 399-425 
Fawcett, C. D., 263 


Federal (U.S.) Works Agency, 
Works Project Administration, 
621-622 


Fieller, E. C., 016 

Finance, statistics, 615 
Fineness of grouping, 333, "52 
Fisher, В. А., 69, 212, 215-216, 


230, 249, 252, 310, 330, 
359, 439, 470, 474, 418, 
572, 620, 621, 624, 625 
Fisher's 2, 620 
Fisher's z from г, 621 
Fitting curves, 301-303, 506 


Forms of distribution, 238-240 

Forsyth's Г` formula, 588 

610 

559 

Frequency distributions, 614, 
616 

Frequency polygon, 
122, 134-140 


Frequency ina class, SEE Class 
frequency 


Fourier series, 


Freeman, Harold, 


100, 121- 


Further applícations of se- 
quential analysis, 570 


Galton, Francis, 
348, 361, 362 
Gamma function, 585-588, 61 
бая, , ‚ 613, 
SEE also Digamma function 


279-280, 331- 


Gamma function, 
589, 619 


Garrett, Henry E., 59 
Gauss, 211, 293, 339, 349 


General purpose table, 87, 89- 
91, 94-95 


Geographic series, 62-63, b= 


incomplete, 


67, 12, 153-166, 1932194, 
198 

Geometric mean, 235, 248, 265- 
271 

Geometric series, 696 
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Glover, J. M., 615 
Gnomonic projection, 165 
Gomperz, 33, 608 


Goodness of fit, 303, 304, 318, 
507 


Gram pol упоп? 41, 618 


Grammian determinants and ma- 
trices, 459 


Graphic methods, 98-184 
Graphs, 21, 27, 32-33 
Greek alphabet, 677 
Grouping, 129, 131 
Growth, 33, 115-116 


Growth curves, 65, 100, 115- 
120, 198, 608 


Growth series, 65, 15 


Gudermannian, 625 


Harmonic mean, 
271, 274 


Hartley, H. O., 620, 623, 625 
"ou , ‚ 623, D 


234-235, 248, 


Henderson, James, 615 

Heron, David, 382 

Hilferty, Margaret M., 319, 599 
Histogram, 100, 121 
Historical serles, 68 
Hoffman, Arthur C., 504 
Holzinger, Karl J., 696 
Homogeneous grouping, 428-429 
Homoscedasticity, 342, 373 
Horn, Daniel, 368 

Hotelling, Harold, 559, 696 
Hyperbolic functions, 621,62u 
Hypergeometric series, 583-585 
Hypothesis, null, 332, 355-356 


Identity matrix, 480 
Incomplete beta function, 310 
Independent measures, 15 

70, 115 

Index numbers, 198 


Index, 
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Insurance statistics, 615, 619 
Integral function, 692 
Integrals, tables of, 613, 619 
Integration, 692 

Intermediate table, 93-95 
Internal mean, 236 
Interpercentile ranges, 230-232 


Interpolating values in a table, 
601-603 


Interpolation, 27, 
622-623 


accuracy of, 541-543, 627 
two-point, 541, 622, 626 
three-point, 5u1, 622, 626 
four-point, 541, 622-623, 626 
harmonic, 619, 623 

inverse, 539-543, 622, 625 
inverse, quadric, 543, 626 


538-543, 


Interval, 122, 127, 539 
531-538 
Invariance, 23, 51, 573 
Inverse matrix, 479-480 


Interval, optimal, 


tsogonal transformation, 691 


J-shaped curves, 240 
(includes "twisted 4 type") 

James, Vern, 432 

Jerome, Harry, 68 

Judgment of irrelevance, 6-9 


Judgment of sameness, 6-9 


K (Fisher) statistics, 
215-216 , 


Kelley, 
25%, 
393, 
427, Ч30, 
588, 605, 
626, 627 


212, 


Truman L., 
289, 
$085 


189, 
301, 326, 
418, „121, 
458, 499, 
612, 619, 


231, 
330, 
424, 
511, 
623, 


Kelley-Salisbury iterativepro- 
cess, 457 


Kendall, M. G., 212, 582, 615 
у. M., 47 

Копдо, T., 616 

Koren, John, 10 


Keynes, 
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Kuder, б. F.a, 404 


Kurtosis, 195, 237, 242, 246- 
241, 248 


Kurtz, А. K., 616 


Labelling classes, 129-140 
Lag and lead, 492-497 
Lagrange multipliers, 607-608 


Lagrangian interpolation coef- 
ficients, 538-542, 622-623, 
626-627 


Laplace, 216, 277 
Latin squares, 572, 620 
Lead and lag, 492-497 
Least squares, 37 

Lee, Alice, 614, 615 
Legendre, A. M., 614 


Lexis' ratio lLlexian ratio. 
Coefficient of disturbancy), 
198, 696 


Likelihood ratio, 560 
Limits, natural, 104, 195, 497 
Linear restrictions, 355, 482 
Logarithmic chart, 110 


Logarithms, 27, 31-35, 34-35 
210-211, 614-615, 621, 696 


Logistic, 33 
Long, Н. L., 529 
Lowan, A. N., 621-622 


Macdonnell, W. R., 263 

Maclaurin series, 610 

Maher, Helen C., 519 

Makeham's mortality curve, 608 

Mantissaofea common logarithm, 
692 

Maps, 99, 100, 155, 158 

Mathematical statistics, 7-8 

Matrices, 573-577 


addition of matrices, 574 

adjoint of a square matrix, 
574, 576 

associative properties, 576 

conjugate elements, 574 
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Matrices 


diagonal, 576-577 

distributive properties, 576 

elements, 573-574 

identity matrix, 575-576 

inverse product, 576 

multiplication of, 472, 515, 
516, 517 

order of, 574 

pre- and post-multiplication 
of, 515 

scalar, 574-577 

symmetric, 574 

transpose, 574, 575 


in connectíon with 
459- 


Matrices 
multiple correlation, 
460, 475, 481 


Maximum likelihood, 246, 254 


McNemar, Quinn, 458 
Mean, 207-210, 234-235, 253 


Mean, computation of, 216-222 — 
Mean, properties of, 218-219 


Mean, variance error of M from 
correlated data, 357 


Mean weighted, 505-506 

Mean deviation, 227-230 

Mean of N ranks, 366 

Median, 229, 235, 240-248 
Medical statistics, 620 
Mental 404-405 
Mercator projection, 164-165 
Merrington, M., 620, 625 
Mid-parent measure, 338 

Mills, 4. Р,, 616 

Mises, Richard von, 7 


Mode, 128-129, 131-132, 236, 
258-265 


Modified Doolittle solution, 
458-471 


Molina, Е. С., 583, 624 
Moments, 196, 210-227, 507 


computation of, 


messurement, 


Moments, 217- 


222 
Moments, infinite, 510 


Moments, negative, 510 
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Monotonic, 49, 52 

Moul, Margaret, 616 
Moving average, 258 
Multimodality, 195, 237 


Multiple correlation, and re- 
gression, SEE Correlation, 
multiple 


Multivariate analysis, 198 


National (U.S.) Bureau of Stan- 
dards Tables, 621-622 
Natural limits (SEE also "zero 


point"), 193, 194 
Napierian logarithms, 697 
Neyman, J., 559 
Nomographs, 616 Я 
Nonlinear regression, 334, 442- 


453 

Normal bivariate distribution, 
348 

Normal correlation, 341, 343, 
348 


Normal distribution, 36, 214, 
259, 275-310, 347, 349, 
512, 529-531, 532-534, 538, 
564, 613, 615, 616, 621, 
623, 626-627, 639-656 

Normal equations, 456-457, 463- 
464, 471-475 

Normal equi-probable and mean 

ranges, 529-538 

Normal skewness, 316 

Normal kurtosis, 316-317 

Normal optimal interpercentile 

range, 231 

Normalizing a distribution, 
198, 592-593, 620 

Normalizing a variance ratio, 
310, 325-331, 599 

Norton, Н. W., 620, 624, 625 

Null hypothesis, 332, 355-356 

Number of classes to use, 133- 
134 


Numerical solutions of compli- 
cated simultaneous equations, 


591-592 
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Numerical solutionof parabolic 
and exponential equations, 
589-591 


Observed value, 52 

Observing eye, 88 

Ogive, 36, 100, 141-149 

One-sided alternative test of 

the mean, 567 

One-third sigma rule, 223 

Operating characteristic curve, 
557, 559, 560, 568 


Optimal interval for graphic 
portrayal, 531-538 


Ordered variable, 336 
Orthogonal, 53 

Orthogonal polynomial, 620-621 
Orthogonal transformation, 697 
Orthographic projection, 165 


р аз a proportion in a normal 
distribution, 374 


p asa proportion corresponding 
to a variance ratio, 491-492 


Pairman, Eleanor, 614, 618 


Parabola, 33 
Parabolic equations of high 
degree, solution of, 589 


Parabolic regression, 334-335, 
355, 442-453 


Parameter, 355, 432 
Parent population, 4-5 


Partial correlation,—SEE Mul- 
tiple correlation 


Patton, А, С., 95 


Pearson, Egon $., 529, 614, 
625 


Pearson, Karl, 240, 244, 245, 
249, 253, 255, 258, 263, 
264, 289, 310, 339, 
343, 364, 367, 382, 386, 
392, 411, 430, 431, 507, 
522, 548, 559, 583, 
584, 613, 614, 615, 
616, 619 


Pearson system of curves, 240, 


251, 253, 613 
Pearson Type | curve, 264 
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Pearson Type 111 curve, 251-252, 
26% 


Pearson Туре IV curve, 264 
Peirce, В. 0., 613, 619 

Pentad function, 422 
Percentile chart, 100, 141-149, 


195 
Percentiles, 95, 100, 141-149, 
230-232 


Period, 71, 193, 482-492 


Periodogram analysis, 198, 482- 
492 


Permutations, 693 
Perspective, 166-173 
Pictogram, 100, 151 

Pie chart,—SEE Circle chart 
Point binomial, 581-582 
Poisson, 564 


Poisson distribution, 316, 570, 
582-583, 599, 614, 615 


Polynomial distribution, 609 
Population, center of, 159-363 
Population, parent, 8-5 


Population statistics, 204-227, 
451-452 


Powys, А. Wey 263 

Predictand, 387 

Prediction equation, 15-19 
Predictor, 387, 464 
Presentation of results, 19 
Pretorius, J., 616 

Price index, 115" 


Price ratios (SEE also Ratio), 
11%, 268-269 e 


Primary table, 87 

Principal components, 572 

Probability, 28, 37, 78 

Probability, apriori, 312-313, 
321, 324 

Probability, aposteriori, 322, 
324 

Probability function,—SEE 

Normal distribution 
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Probability of joint occur- 
rence, 319-320 


Probable error, 295-29" 
Product moment,—SEE Covari- 
ance, 350 
Product moments, higher, 55% 
Proportion, 193, 19", 196, 198 
Proportion, moments of a, 555 
Proportions, correlation and 
regressions of, 545-548 
Pseudo-temporal series, 64 


Publication, decimal places to 
be kept in, 222-223 


Quadratic mean, 235 

Quadrature, 697 

Qualitative series, 62-63, 67, 
72-74, 150-152, 153-154, 
194-197, 333 

Qualitative-spatial series, 63, 
67 

Qualitative-temporal 
62-63, 67 

Quantification of qualitative 

data, 311-313 

Quantitative series, 63, 67, 
12, 13, 1%, 120-149, 154, 
194-197, 333-334, 336 


series, 


Quartile deviation, 231-232, 
294 


Quetelet, 1. A. Jey 
Quotients, 501-504 


211-219 


r x 10, SEE Correlation 
Radians, 598 

Random sampling numbers, 621 
Range, SEE Variability in range 


Rank correlation, 365-310, 396- 
398 


Ratio, 70, 198 
Rational function, 697 
Rational integral function, 697 
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Ratio test for convergence, 697 
Read, Cecil B., 159 

Reduced measures, 344 
Regression, 332-395 


Regression, nonlinear, 335, 
338, 449-452 


Regression coefficient, vari- 
ance error of, 357 


Regression line, trustworthi- 
nessofa point upon, 363-365 
Rejection region, 559 


Relative time chart, 100, 108, 
107, 209 


Reliability, triad formula for, 
420-421 


Reliability coefficient, 403, 
404, 423-427 


Reliability of measurement, 
420-425 


Reliability step-up formula, 
406, 407, 408 


Residual, 18 
Retrospective chart, 107 
Reversion, 338, 341 
Rhind, A. J., 508, 614 
rho = p , 365-370 


rho, corrected for ties in 
rank, 368-370 


Richardson, М, We, 404 


Rietz, Henry Lewis, 69, 45 
507, 513 м 


Robinson, б., 529, 589, 610 
Romanovsky, V, 316, 360, 581-582 


Rotated variables, functions 
of, 604-605, 619 


Rounding off answers, 222-223 

Rulon, Phillip J., 284-285, 
403, 405 

Saffir, Milton, 381, 617 

Salisbury, Р. S., 457 

Salvosa, 1. Re, 252, 517 


Sample, 4-7, 194-195, 337, 355, 
473 


Sample number, 556 


Sampling, 5, 6-7, 194-195, 337, 
› 47 

Satakopan, V., 620-621 

Scarborough, James B., 45, 589 

Scatter diagram, 340 

Scedasticity, 342 

Score, 53, 125-127 

Seasonal fluctuations, 70-71 

Secrist, Horace, 134 

Segmented bar diagram, 151 


Semi-interquartile range, 231- 
232 


Semi-logarithmic chart, 110- 
115 


Seminvariants, 211 


Sequential analysis, 555-570, 
572 


Series, statistical, 4-5, 62-82 
graphic, 62-63, 66, 67, 72 
qualitative, 62-63, 67, 72, 

13 
quantitative, 63, 67, 73-74 
spatial, 62-63, 66, 67, 12 
temporal, 62, 63, 67, 69-70 

Shading, 155 

Shen, Eugene, 407, 420-421 

Sheppard, W. F., 218, 615, 625 


Sheppard's corrections, 218, 
361, 392 


Shrinkage in г, 333, 450-452, 
468, 474 


Significant figures, 38-45 


Simultaneous equations, solu- 
tion of, 580-581,—8SEE also 
Doolittle method, 


Sinusoidal interrupted projec- 
tion, 166 


Skewed distribution, 134-140 


Skewness, 195, 237, 247, 248- 
252, 269-271, 316 


Skewness, Fisher, 248-249 
Skewness, Pearson, 248-249 
Skewness, Pearson By, 248-249 
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Slope, 349 
Smith, B. Babington, 529, 615 
Smith, James G., Ч 


Smithsonian mathematical 
bles, 624 


Snedecor, George W., 310 
Soper, Н. Е., 375 
Space, two dimensions, 603-605 


ta- 


Space, three or more dimensions, 
605-606 


Spatial series, 62-63, 66-67, 72 


Spearman, C., 367-368, 405, 
406, 412, 696 


Spearman's correction for at- 
tenuation, 412, 617 


Spearman's foot-rule formula 
for correlation, 367 


Spearman's rank formula for 
correlation, 367, 617 


Spearman's reliability coef- 
ficient, 405, 617 


Special purpose table, 87, 88, 


91-93, 94 
Split test reliability, 402 
405 


Spot maps, 198 
Square contingency, 302 


Square roots, 543-544, 612, 
617, 621, 621, 652-656 
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Stable features, 185, 198 
Stabler, Е. 8., 529 


Standard deviation, 201-202 
SEE also Variange 


Standard deviation of mean, 
207-210 


« 
Standard efror,—SEE also Vari- 
ance error 


Standard error, 294 


Standard error of estimate, 
364, 403, 407, 592-593 


Standardized test, 364-365 


Standard scores, 216, 293, 333, 
454 


Statistic, 53-54 
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Statistics of attributes, 311- 
331 

Statistical analysis, 13, 22-24 

Statistical processes, 74-79 

Statist. Research Group, 559, 570 

Statistical series, 3-5, 62-85 


Statistical series, geographic, 
62-63, 66, 61, 12 


Statistical series, qualita~ 
tive, 62-63, 67, 12, 13 


Statistical series, quantita- 
tive, 63, 67, 13-14. 


Statistical series, spatial, 
62-63, 66, 61, 12 


Statistical series, temporal, 
62-63, 61, 69-10 


Statistical tables, 84-97 


Stereoscopic presentation, 153, 
159 " 


Stevens, Stanley Smith, 529 
Stippling, 155 


Stirling's*approximation to the 
factorial, 609-610 


Stochastic series, 312 

Stoessiger, B., 616 

Stub, 86, 87, 89-90 

"Student's" t, 284-286, 306-310, 
320, 351, 620, 625 

Subordination, 498 


Summation method for computa- 
tion of the mean and of the 
moments, .220-222 


Summarizing statistics, 57, 73- 
1+ 


Systematic error, Ч1 


t,—SEE "Student's" t 
Tables, 612-658 


Tables, expanding by interpo- 
lating, 601-603 


Tabulation of data, 76-77 
Taylor's series, 610 


Temporal series, 62-63, 69-10, 
101-120, 193, 482-497 
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Terminal statistics, 79, 233  Transmutation, 56, 340-341 
248, 


Test.of varlability, 516 


Tetrachoric correlation, —SEE 
Correlation, Tetrachoric 


Tetrad, 421-422 

Tetrad differénce, 421-422 

Theile, T. N. 212, 26u 

Thompson, A. J., 615 

Mie NEED a, Catherine M., 621, 
23, 625 

Thomson, Godfrey H., 120, 696 

Thorndike, Е. Le, 272, 282-283 


Three-dimensional portrayal, 
166-172 


Thurstone, L. L., 59, 384, 617, 
696 


Time chart, 100, 103-110 
Time limit test, 272 
Time ratios, 115 


Time reversal test ofan index, 
268-269 
Tippett, L. He 
5, 616 

Tolley, Н, R., 458 

Transformation, 51, 56, 193, 
195, 198, 297-298, 340-341 

Transformation of F, 310, 325- 
331 

Transformations, cube root, 599 


Transformations, square root, 
598-599 


Transformations of percentage 
positions into scores, 592- 
598 


Transformations of ranks into 
equally reliable scores, 592- 
598 


Transformations of ranks 
scores, 592-598 


Transformation to produce lin- 
ear regression, 335 


С., 529-531, 


into 


Transforming rank and percen- 
tage positions into quanti- 
tative scores, 592-598 


Trend, 70, 193, +83 
Triad, 420 


Trigonometric functions, 603, 
613, 619, 621, 624-625 


True ability statistics, 407- 
408-424 


Trustworthiness of various 
measures, 236-238 


Truth, 197 
Two-point distribution, 313-314 
Two-sided alternative, 567 


Unimodal, or "I" or "A" curves, 
238, 240 


U. S., Subdivisions of, 155-157 


U. S. Bureau of Standards (Fed- 
eral Works Agency: Works Pro- 
ject Administration for the 
State of New York), 289 


U. S. Census, lóth Abstract, 
156-157 


Vantage polnt, 167 
Variability, 15, 195, 199-233 


Variability in range, effect 
of, 425, 432, 616, 625 


Varlables, types of, 336 
Variance, 15, 199 


Variance, analysis of, 332, 
345-346, 354-358, 438, 440, 


442, 446-453, 455-456, 468 
469, 470- -Ч11, 475, 477-478, 
594-595 

Variance, compwtation,of, 216- 
222 

Variance error, derivation of, 
523-529 


Variance error, derivation of 


binomial expansion method, 
523 


Variance error, derivation of 
differential method, 524 
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Variance error, derivation of 
logarithmic differentials 
method, 526-529 


Variance error, derivation of 
Taylor series method, 525 


Variance error 


of a correlation coefficient 
corrected for attenuation, 


526-529 

of a point on a regression 
line, 364 

of a regression coefficient, 
440 


of biserlal г, 375 


of estimate, 349-350, "01- 
402, 407, 436, 450 


of estimate in multiple сог- 
relation, 436 


of partial r, 473 

of partial z, 473 

of r, 360, 554 

of tetrad, 422 

of triad reliability coef- 
ficient, 421 


Variance erroe&s of moments, 


223-227 
Variance of a difference, 203, 
398 


Variance of an array, 347 
Variance of a proportion, 545 


Variance of a sum, 395, 397, 
407 


Variance of Dio» 478, 553 


Variance of covariance, 555 
Variance of mean, 207-210 
Variance 6f N ranks, 366 
Variance of ф, 382 

Variance of гү, 386-388 
Variance of sums of ranks, 397 


Variance of the constant term 
in a regression equation, 554 


Variance of V, 223, 555 
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Variance ratio, 198, 304, 306- 
310, 319, 325-326, 331, 
355-358, 312-313, 381, 438- 
439, 430-441, 947-448, 451- 
473-475, 477-478, 488, 503- 
504, 620, 625, 626 


Vector, 691 


Wald, A., 559 
Wald's probability ratio, 559 
Waldron, L. R., 376 

Walker, Helen M., 11, 45 


Weighted sums, statistics of, 
395-398 


Weighting consequent 
ability, 419-425 


Weighting to secure the most 
reliable mean, 505-506 


Wherry, В. J., 474 

Whittaker, Е. Т., 529, 589, 610 
Wilder, Marfan, 61 

Wilson, Edwin B., 319, 599 
Wolfowitz J., 559 

Work limit test, 272 


to reli- 


X (and & scores, 217 


Yates, F., 310, 320, 330, 620, 
624, 625 


Yerkes, Robert M., 431 


Young, Benjamin F., 47, 59, 95 


Yule, G. Udny, 68, 212, 379- 
382, 439, 529, 582 


2, — r into z (Fisher), 358- 
360 


Zero point, natural, 69, 
104, 127 


103- 
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